Digital Repository

Sentiment Analysis for the Sinhala Language with BERT Based Language Model

Show simple item record

dc.contributor.author Peiris, Y.N.S
dc.date.accessioned 2022-03-07T05:45:04Z
dc.date.available 2022-03-07T05:45:04Z
dc.date.issued 2021
dc.identifier.citation Peiris, Y.N.S (2021) Sentiment Analysis for the Sinhala Language with BERT Based Language Model. BSc. Dissertation Informatics Institute of Technology en_US
dc.identifier.issn 2017281
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/849
dc.description.abstract " Sinhala is a low resource language that is spoken by 16 million people in Sri Lanka which is the native language of Sinhalese people. Due to the lack of resources, there are only a minimal amount of researches conducted in the territory of sentiment analysis based on the Sinhala language when compared to other languages like English and Chinese. Most of the existing researches have been conducted by using lexicons and dictionary-based approaches combined with classification algorithms. With the advancements of word embedding and deep learning techniques, recent researches have emerged with utilizing these techniques in the Sinhala language domain for sentiment analysis and text classification tasks. Even more recent developments in the Natural Language Processing (NLP) landscape like Bidirectional Encoder Representations from Transformers (BERT) based language models which have achieved state-of-the-art results for a variety of tasks in the NLP domain haven’t been applied to the Sinhala language domain as of now. Therefore, we introduced a sentiment analysis model for the Sinhala language by using BERT based language model known as Language-agnostic BERT Sentence Embedding (LaBSE). The classification is done using both binary and multiclass dataset consisting of Sinhala news comments. An F1-score of 89.82% for the binary classification and an F1-score of 64.72%for the multiclass classification was achieved by the newly introduced model which surpasses the existing research achievements carried out using deep learning and static word embedding approaches. " en_US
dc.language.iso en en_US
dc.subject Language Models en_US
dc.subject Deep Learning en_US
dc.subject Sentiment Analysis en_US
dc.title Sentiment Analysis for the Sinhala Language with BERT Based Language Model en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account