Digital Repository

SinClassify - Sinhala Text Classification System

Show simple item record

dc.contributor.author Koralage, Anjuka Dulan
dc.date.accessioned 2020-04-27T17:26:00Z
dc.date.available 2020-04-27T17:26:00Z
dc.date.issued 2019
dc.identifier.citation Koralage, Anjuka Dulan (2019) SinClassify - Sinhala Text Classification System. BSc. Dissertation Informatics Institute of Technology. en_US
dc.identifier.other 2014022
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/295
dc.description.abstract Since the world beginning, humans, animals use various ways/methods to exchange their ideas between others. From that language is the high-level mechanism to communicate with each other. Language modifies and rich day by day. Language basics laydown on sounds and characters/text. In the recent past, the result of the growth of technology, textual content in web and IT sector was going to a high level. Especially not only vast usage language like English, France, Chinese… but also least usage language like Sinhala, Tamil… the language also shows a reasonable amount of content on Web and IT-related publications. Because of these reasons, automatic text categorization became an important path to many types of research. The propose of this project is to create an accurate text classification mechanism for the Sinhala language. The project called “SinClassifi” Deeply, SinClassifi project focus machine learning natural language processing path and follow the steps which recommended for textual processing. SinNG5 Sinhala corpus dataset (Lakmali and Haddela, 2018) and more data get from online resources (adaderana.lk and bbcsinhala.com, 2019) combined and remake a proper data set used as the corpus in this project. Because of the limitations, In the data preprocessing stage use customize stop word list by under supervision Sinhala language expertise. TF-IDF use as numerical vectors. For better result, scoped several machine learning classification methods and finally come up with the best one. The target audience of this project is the Sinhala text users on the web or any other IT related sector. (Computerized Sinhala text content). Further, they can be University students, Journalists, Sinhala language Researches, normal web readers… The application which builds using the classification, users can copy the Sinhala text content while they refer the document and via a mobile application, they also can classify their set of text. en_US
dc.subject Automatic text classification en_US
dc.subject Sinhala language en_US
dc.subject Multi-class classification en_US
dc.title SinClassify - Sinhala Text Classification System en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account