Abstract:
"Language is a unique form of communication between human beings with their environment.
In that the most natural way of communicating with others is through the voice. For this
nowadays there are many speech technologies available for a range of tasks. But still there is a
prominent research area on speech recognition and summarization tasks on low resource
language. Sinhala language is also one of the low resource languages, as there aren’t enough
resources available on the internet.
Nowadays, people are not intrusive of listening to continuous audio contents. Even if they listen
to continuous audios, as a result they skip and try to get the information. Due to this they might
get the wrong picture of information. So, as a solution the author has proposed a system for
summarizing the continuous of Sinhala audio contents. Due to this people can save their
valuable time while getting the correct information through the audio easily.
This system takes continuous audio files as input and generates the summary output.
For the proposed system the author has trained a model using transfer learning approaches and
fine tune the pre trained Whisper AI model for the Sinhala. Also, with the test set it obtained a
CER of 0.3. Then the generated continuous of audio files combined as a paragraph and sent to
the summarization model which contains on summarizing through the sentence scoring on
word frequency approach. And then the summarization output will be generated.
Keywords - Natural Language Processing, Speech Recognition, Extractive Summarization,
Audio Summarization
Subject Descriptors:
Computing methodologies → Artificial Intelligence → Natural Language Processing → Speech
Recognition
Computing methodologies → Artificial Intelligence → Natural Language Processing → Text
Summarization"