TamilSpeech: Advancing TTS with Cutting-Edge Deep Learning Technologies

Siritharran, Kullaleeni

Home
→
Dissertations & Thesis
→
MSc Bigdata Analytics
→
2024
→
View Item

dc.contributor.author	Siritharran, Kullaleeni
dc.date.accessioned	2025-07-01T06:30:55Z
dc.date.available	2025-07-01T06:30:55Z
dc.date.issued	2024
dc.identifier.citation	Siritharran, Kullaleeni (2024) TamilSpeech: Advancing TTS with Cutting-Edge Deep Learning Technologies. MSc. Dissertation, Informatics Institute of Technology	en_US
dc.identifier.issn	20191240
dc.identifier.uri	http://dlib.iit.ac.lk/xmlui/handle/123456789/2814
dc.description.abstract	"This research focuses on the text to speech system for Tamil language with cutting edge deep learning technologies. The goal is to synthesize tamil speech for the given tamil text. Text-to-Speech (TTS) systems have demonstrated their immense value by transforming written text into audible speech, providing benefits across diverse user groups. The emphasis on Tamil arises from the necessity to embrace linguistic diversity. Most significantly, Text-to-Speech (TTS) technology plays a vital role in aiding visually impaired individuals to navigate digital devices and manage their daily lives. Tamil is an ancient classical language spoken by 15% of Sri Lanka's population and the worldwide Tamil-speaking population is estimated to be around 80 million, making it one of the most widely spoken languages in the world. It poses challenges in automated speech synthesis due to its unique characters and diverse phonetic sounds. The motivation of this study is due to the lack of research on Tamil text to speech applications with cutting edge deep learning technologies. The research involves the implementation of advanced TTS models, including Parallel Tacotron 2 and Hifi GAN, to enhance the synthesis of natural-sounding Tamil speech. The models were built using the PyTorch neural network after thorough analysis of literature of the patent studies. Two models were built on the data to perform two different roles. Text encoder model was built using Parallel Tacotron 2 and speech decoder model was built using Hifi GAN model to generate speech waveform. A user interface is created using the Flask application to implement this solution. The unit testing is done for this work. This report contains the training and validation loss evaluation as a part of evaluation."	en_US
dc.language.iso	en	en_US
dc.subject	Tamil Text-to-Speech (TTS)	en_US
dc.subject	Parallel Taco Tron 2	en_US
dc.subject	Hifi-GAN	en_US
dc.title	TamilSpeech: Advancing TTS with Cutting-Edge Deep Learning Technologies	en_US
dc.type	Thesis	en_US