Digital Repository

ADAPTTEXT : A Novel technique for domain independent Sinhala text classification

Show simple item record

dc.contributor.author Rawya, K. A. Y
dc.date.accessioned 2022-03-16T08:31:20Z
dc.date.available 2022-03-16T08:31:20Z
dc.date.issued 2021
dc.identifier.citation Rawya, K. A. Y (2021) ADAPTTEXT : A Novel technique for domain independent Sinhala text classification . BSc. Dissertation Informatics Institute of Technology en_US
dc.identifier.issn 2017576
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/1022
dc.description.abstract " Text classification facilitates the ability to classify text data into multiple categories by assigning labels. It is a core piece of Natural language processing, consisting of a wide range of use cases, including fake-news detection, sentiment classification, hate speech / cyberbullying detection, user intent classification, news-article classification, and many more. Sinhala is being the most used language in Sri Lanka, which is morphologically rich and agglutinative. Therefore it is complex compared with languages like English, which has a simple morphology. Furthermore, Sinhala has its own writing system where the solutions developed for English might not be reusable. Due to these complexities and being a low resource language, there is no proper generic and automated solution to perform Sinhala text classification. Even the current task-specific text classification approaches have not considered the polysemy of words and were not focused on addressing data scarcity issues. Therefore, AdaptText has been developed as a novel generic architecture and a technique for text classification in Sinhala. In order to measure the efficiency of the proposed novel technique, cross-domain testing and evaluation have been performed with multiple datasets and againt current best performing approaches. AdaptText could address the discussed research gaps and could achieve state-of-the-art results for both binary and multiclass Sinhala text classification." en_US
dc.language.iso en en_US
dc.subject Supervised learning en_US
dc.subject Classification algorithms en_US
dc.subject Knowledge transfer en_US
dc.subject Natural language processing en_US
dc.title ADAPTTEXT : A Novel technique for domain independent Sinhala text classification en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account