Digital Repository

Sintm LDA and Rake Based Topic Modelling For Sinhala

Show simple item record

dc.contributor.author Rathnayake, Mudiyanselage Dinushika Ruwanthi Kumari
dc.date.accessioned 2021-06-20T17:26:09Z
dc.date.available 2021-06-20T17:26:09Z
dc.date.issued 2020
dc.identifier.citation Rathnayake, Mudiyanselage Dinushika Ruwanthi Kumari (2020) Sintm LDA and Rake Based Topic Modelling For Sinhala, MSc. Dissertation Informatics Institute of Technology en_US
dc.identifier.other 2019002
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/499
dc.description.abstract The growth of Information and Communication Technologies (ICT) raised the popularity of the World Wide Web(WWW) among the people. Consequently, vast amounts of text content in the form of articles, newspapers, books, etc have been starting to scatter through the web. Since most of these documents are unstructured and heterogeneous, it opened a new path for text analysis in the research world to retrieve information from unstructured text and then create structured data. To improve the interconnection between the information and human language, Natural language Processing (NLP) researches contribute in different tasks such as information extraction, speech recognition, machine translation, summarization, topic modelling, etc. With technology involved, Sinhala text usage on the web also increased and started to gain attention among researchers. Although several techniques such as text classification, clustering, and named entity extraction were performed on Sinhala, there are open research areas due to the limited number of researches and lack of resources. The SinTM system was built on topic modelling tasks to discover topics in the Sinhala text document. The system provides a novel hybrid approach to detect topics in Sinhala text documents combining topic modelling and keyword extraction techniques at a better interpretability level. It was tested with prominent topic models evaluation matrices such as likelihood, r-squared, perplexity, coherence and benchmarking with a well-known topic modelling algorithm, Latent Dirichlet Allocation (LDA). The web user interface comes with the SinTM system providing a more controllable parameter tuning and easy understandable graph-based view to the user and wealthy in terms of the ability to compare the novel model against other well-known approaches. en_US
dc.subject Natural language processing en_US
dc.subject Topic Modelling en_US
dc.subject Latent Dirichlet Allocation en_US
dc.subject Rapid Automation Keyword Extraction en_US
dc.title Sintm LDA and Rake Based Topic Modelling For Sinhala en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account