GameAudioLDM – Finetuned Text-To-Audio Generation for Sound  Effects in the Game Development Industry

Pallemulla, Asith

dc.contributor.author	Pallemulla, Asith
dc.date.accessioned	2025-06-18T05:18:54Z
dc.date.available	2025-06-18T05:18:54Z
dc.date.issued	2024
dc.identifier.citation	Pallemulla, Asith (2024) GameAudioLDM – Finetuned Text-To-Audio Generation for Sound Effects in the Game Development Industry. BSc. Dissertation, Informatics Institute of Technology	en_US
dc.identifier.issn	20200853
dc.identifier.uri	http://dlib.iit.ac.lk/xmlui/handle/123456789/2638
dc.description.abstract	"Sound effects have become an extremely important measure of a game’s quality in today’s game development industry and are integral to players’ reception of a game product. Despite this industry standard, sourcing high quality and creatively accurate sound effects requires expensive audio engineers, or much time spent consolidating free assets. For smaller developers with low budgets that make up the majority of the industry, these are not viable options, and will often have to sacrifice game quality by settling for sound effect assets that do not match their creative vision. This presents a need for a Text-to-Audio system that can generate custom game sound effect assets matching the developer’s exact request. Existing AI solutions for generating audio assets are unsuitable for the game development industry. They are too slow to rapidly regenerate assets for creative testing, often give unusable outputs, and require manual audio editing in the best of cases. They have not been adopted into the industry due to these reasons, even by small developers who require such a solution. The proposed project will overcome this by using an existing Text-to-Audio generative model as a base and adapting the output to meet the common needs of the game development industry using audio manipulation techniques within a new audio post-processing module. The new system must be simpler to use, have much faster batch outputs for repeated testing, increase audio quality, ensure generated assets are atomic and do not require manual editing to be used in games, and fulfill game developers’ other auditory goals. The author believes these improvements will allow the generative model to be used effectively in the industry and surpass existing solutions. Initial test results have been positive, notably showing a marked increase in the perceived speed, yield, quality, and suitability. Subjective metrics including OVL and REL have also been positive. "	en_US
dc.language.iso	en	en_US
dc.subject	Audio synthesis	en_US
dc.subject	Audio processing	en_US
dc.subject	Game development	en_US
dc.title	GameAudioLDM – Finetuned Text-To-Audio Generation for Sound Effects in the Game Development Industry	en_US
dc.type	Thesis	en_US