LLM - based Conversational Agent for Sinhala Romanized  Language

Jothirathne, Himaranga Miniruwan

dc.contributor.author	Jothirathne, Himaranga Miniruwan
dc.date.accessioned	2025-06-18T09:03:42Z
dc.date.available	2025-06-18T09:03:42Z
dc.date.issued	2024
dc.identifier.issn	2019928
dc.identifier.uri	http://dlib.iit.ac.lk/xmlui/handle/123456789/2660
dc.description.abstract	"As a whole, many people in the world use open-domain LLM-based conversational agents such as ChatGPT, BARD in their day-to-day life. However, most of these chatbots are limited to the English community. There is a special community in Sri Lanka that knows only their official language Sinhala and they are restricted from experiencing the true potential of these LLB-based chatbots because the Backbone LLMs of these chatbots are not adapted well to low-resource languages such as Sinhala. They also should have experienced the powers of LLMs. Only then can LLMs truly be considered socialized in Sri Lanka. The author of this study is attempting to give the ability of Romanized Sinhala language comprehension and generation to LLM using the Retrieval Augmented Generation (RAG) approach. The comprehension and generation capabilities of various open-source LLMs are analyzed in this study. Experiments with Parameter-Efficient Fine-Tuning (PEFT) to adapt LLMs to Romanized Sinhala were also conducted as a comparison in this study. In this research, the author is going to archive state-of-the-art results on adapting LLMs to understand and generate the Romanized Sinhala language. RAG architecture with Gemini-Pro model as a text generation model gives best BLEU and ROUGH scores for the Romanized Sinhala content. Conversational dataset for Romanized Sinhala also published for public throughout this study."	en_US
dc.language.iso	en	en_US
dc.subject	Romanized Sinhala	en_US
dc.subject	Text Generation	en_US
dc.title	LLM - based Conversational Agent for Sinhala Romanized Language	en_US
dc.type	Thesis	en_US