| dc.description.abstract | 
The growth of Information and Communication Technologies (ICT) raised the popularity of the  World Wide Web(WWW) among the people. Consequently, vast amounts of text content in  the form of articles, newspapers, books, etc have been starting to scatter through the web. Since  most of these documents are unstructured and heterogeneous, it opened a new path for text  analysis in the research world to retrieve information from unstructured text and then create  structured data. 
To improve the interconnection between the information and human language, Natural  language Processing (NLP) researches contribute in different tasks such as information  extraction, speech recognition, machine translation, summarization, topic modelling, etc. With  technology involved, Sinhala text usage on the web also increased and started to gain attention  among researchers. Although several techniques such as text classification, clustering, and  named entity extraction were performed on Sinhala, there are open research areas due to the  limited number of researches and lack of resources. 
The SinTM system was built on topic modelling tasks to discover topics in the Sinhala text  document. The system provides a novel hybrid approach to detect topics in Sinhala text  documents combining topic modelling and keyword extraction techniques at a better  interpretability level. It was tested with prominent topic models evaluation matrices such as  likelihood, r-squared, perplexity, coherence and benchmarking with a well-known topic modelling algorithm, Latent Dirichlet Allocation (LDA). The web user interface comes with  the SinTM system providing a more controllable parameter tuning and easy understandable  graph-based view to the user and wealthy in terms of the ability to compare the novel model  against other well-known approaches. | 
en_US |