Abstract:
This paper attempts to investigate novel Text-to-Speech algorithm based on Deep voice which is an attention based, fully convolutional mechanism. The procedure of producing speech synthesis involves with learning statistical model of the human vocal production mechanism which is eligible of taking some text and vocalize that as speech. This paper would reveal the route of the attempt where there is the destination of accuracy and realism. Serenity and fluency are the most important qualities which expect from a TTS. The idea is to give an outline of discourse amalgamation in the Sinhala language, compresses and replicates about the characteristics of different blend procedures utilized. The proposed TTS synthesizing with the neural network based approach to perform phonetic-to-acoustic mapping has described by the purpose of applying for multilingual synthesizers.