Digital Repository

Sinhala News Clustering using Contextual Word Embeddings

Show simple item record

dc.contributor.author Hewaralalage Amaradasa, Dharshika Nayanathara
dc.date.accessioned 2023-01-12T09:28:20Z
dc.date.available 2023-01-12T09:28:20Z
dc.date.issued 2022
dc.identifier.citation Hewaralalage Amaradasa, Dharshika Nayanathara (2022) Sinhala News Clustering using Contextual Word Embeddings. MSc. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 20200066
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/1402
dc.description.abstract "With the modern-day technological expansions, accessing and sharing news has become a necessary commodity of websites and social media platforms. As a result, there are many news sites on the web providing myriads of information to end-users. News aggregators such as Google News and Bing News are available on the web to view such information effortlessly. The main task of these tools is to collect news articles from different news sites and transfer them into one location for easier access. Further, these aggregators are capable of automatically clustering/grouping news articles based on the similarities of article content. While there are many news aggregators for the English language, limited research has been conducted on implementing Sinhala News aggregators [1]. Further, it can be noticed that these aggregators were implemented using traditional data representation techniques such as TF-IDF [2] with clustering algorithms. However, recent research in news document clustering [3] [4] [5] in other languages demonstrates the use of modern data representation techniques such as pre-trained word Embeddings, namely Glove [6], FastText [8] and Word2Vec [9]. It is also noticeable that the Contextual word embeddings, BERT [7] have been used in some of the newest text clustering research due its higher performance compared to word embeddings. This research will explore the possibilities of clustering news for the Sinhala language using Contextual word embeddings with suitable clustering algorithms. " en_US
dc.language.iso en en_US
dc.subject Natural Language Processing en_US
dc.subject Document Clustering en_US
dc.subject Word Embeddings en_US
dc.title Sinhala News Clustering using Contextual Word Embeddings en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account