| dc.contributor.author | Peiris, Y.N.S | |
| dc.date.accessioned | 2022-03-07T05:45:04Z | |
| dc.date.available | 2022-03-07T05:45:04Z | |
| dc.date.issued | 2021 | |
| dc.identifier.citation | Peiris, Y.N.S (2021) Sentiment Analysis for the Sinhala Language with BERT Based Language Model. BSc. Dissertation Informatics Institute of Technology | en_US |
| dc.identifier.issn | 2017281 | |
| dc.identifier.uri | http://dlib.iit.ac.lk/xmlui/handle/123456789/849 | |
| dc.description.abstract | " Sinhala is a low resource language that is spoken by 16 million people in Sri Lanka which is the native language of Sinhalese people. Due to the lack of resources, there are only a minimal amount of researches conducted in the territory of sentiment analysis based on the Sinhala language when compared to other languages like English and Chinese. Most of the existing researches have been conducted by using lexicons and dictionary-based approaches combined with classification algorithms. With the advancements of word embedding and deep learning techniques, recent researches have emerged with utilizing these techniques in the Sinhala language domain for sentiment analysis and text classification tasks. Even more recent developments in the Natural Language Processing (NLP) landscape like Bidirectional Encoder Representations from Transformers (BERT) based language models which have achieved state-of-the-art results for a variety of tasks in the NLP domain haven’t been applied to the Sinhala language domain as of now. Therefore, we introduced a sentiment analysis model for the Sinhala language by using BERT based language model known as Language-agnostic BERT Sentence Embedding (LaBSE). The classification is done using both binary and multiclass dataset consisting of Sinhala news comments. An F1-score of 89.82% for the binary classification and an F1-score of 64.72%for the multiclass classification was achieved by the newly introduced model which surpasses the existing research achievements carried out using deep learning and static word embedding approaches. " | en_US |
| dc.language.iso | en | en_US |
| dc.subject | Language Models | en_US |
| dc.subject | Deep Learning | en_US |
| dc.subject | Sentiment Analysis | en_US |
| dc.title | Sentiment Analysis for the Sinhala Language with BERT Based Language Model | en_US |
| dc.type | Thesis | en_US |