Abstract:
"Sound effects have become an extremely important measure of a game’s quality in today’s game
development industry and are integral to players’ reception of a game product. Despite this
industry standard, sourcing high quality and creatively accurate sound effects requires expensive
audio engineers, or much time spent consolidating free assets. For smaller developers with low
budgets that make up the majority of the industry, these are not viable options, and will often have
to sacrifice game quality by settling for sound effect assets that do not match their creative vision.
This presents a need for a Text-to-Audio system that can generate custom game sound effect assets
matching the developer’s exact request.
Existing AI solutions for generating audio assets are unsuitable for the game development industry.
They are too slow to rapidly regenerate assets for creative testing, often give unusable outputs, and
require manual audio editing in the best of cases. They have not been adopted into the industry
due to these reasons, even by small developers who require such a solution. The proposed project
will overcome this by using an existing Text-to-Audio generative model as a base and adapting
the output to meet the common needs of the game development industry using audio manipulation
techniques within a new audio post-processing module. The new system must be simpler to use,
have much faster batch outputs for repeated testing, increase audio quality, ensure generated assets
are atomic and do not require manual editing to be used in games, and fulfill game developers’
other auditory goals. The author believes these improvements will allow the generative model to
be used effectively in the industry and surpass existing solutions.
Initial test results have been positive, notably showing a marked increase in the perceived speed,
yield, quality, and suitability. Subjective metrics including OVL and REL have also been positive. "