Sintm LDA and Rake Based Topic Modelling For Sinhala

Rathnayake, Mudiyanselage Dinushika Ruwanthi Kumari

dc.contributor.author	Rathnayake, Mudiyanselage Dinushika Ruwanthi Kumari
dc.date.accessioned	2021-06-20T17:26:09Z
dc.date.available	2021-06-20T17:26:09Z
dc.date.issued	2020
dc.identifier.citation	Rathnayake, Mudiyanselage Dinushika Ruwanthi Kumari (2020) Sintm LDA and Rake Based Topic Modelling For Sinhala, MSc. Dissertation Informatics Institute of Technology	en_US
dc.identifier.other	2019002
dc.identifier.uri	http://dlib.iit.ac.lk/xmlui/handle/123456789/499
dc.description.abstract	The growth of Information and Communication Technologies (ICT) raised the popularity of the World Wide Web(WWW) among the people. Consequently, vast amounts of text content in the form of articles, newspapers, books, etc have been starting to scatter through the web. Since most of these documents are unstructured and heterogeneous, it opened a new path for text analysis in the research world to retrieve information from unstructured text and then create structured data. To improve the interconnection between the information and human language, Natural language Processing (NLP) researches contribute in different tasks such as information extraction, speech recognition, machine translation, summarization, topic modelling, etc. With technology involved, Sinhala text usage on the web also increased and started to gain attention among researchers. Although several techniques such as text classification, clustering, and named entity extraction were performed on Sinhala, there are open research areas due to the limited number of researches and lack of resources. The SinTM system was built on topic modelling tasks to discover topics in the Sinhala text document. The system provides a novel hybrid approach to detect topics in Sinhala text documents combining topic modelling and keyword extraction techniques at a better interpretability level. It was tested with prominent topic models evaluation matrices such as likelihood, r-squared, perplexity, coherence and benchmarking with a well-known topic modelling algorithm, Latent Dirichlet Allocation (LDA). The web user interface comes with the SinTM system providing a more controllable parameter tuning and easy understandable graph-based view to the user and wealthy in terms of the ability to compare the novel model against other well-known approaches.	en_US
dc.subject	Natural language processing	en_US
dc.subject	Topic Modelling	en_US
dc.subject	Latent Dirichlet Allocation	en_US
dc.subject	Rapid Automation Keyword Extraction	en_US
dc.title	Sintm LDA and Rake Based Topic Modelling For Sinhala	en_US
dc.type	Thesis	en_US