dc.contributor.author |
Kumari, R.M.D.R |
|
dc.contributor.author |
Hettiarachchi, Saman |
|
dc.date.accessioned |
2025-04-11T10:23:05Z |
|
dc.date.available |
2025-04-11T10:23:05Z |
|
dc.date.issued |
2021 |
|
dc.identifier.citation |
Kumari, R.M.D.R. and Hettiarachchi, S. (2021) ‘SinTM - LDA and RAKE based Topic Modelling for Sinhala Language’, in 2021 Asian Conference on Innovation in Technology (ASIANCON). 2021 Asian Conference on Innovation in Technology (ASIANCON), pp. 1–5. Available at: https://doi.org/10.1109/ASIANCON51346.2021.9545070. |
en_US |
dc.identifier.uri |
https://ieeexplore.ieee.org/document/9545070 |
|
dc.identifier.uri |
http://dlib.iit.ac.lk/xmlui/handle/123456789/2232 |
|
dc.description.abstract |
The advancement of technology increased the usage of textual information in the world. Growing of such numerous types of unstructured and heterogeneous text data become hard to manage. Topic modelling is a technique that retrieves abstract topics from a collection of documents and this technique is highly important to discover hidden and useful information from huge unstructured and heterogeneous text data. Sinhala is the native language in Sri Lanka and primarily spoken by Sinhalese. This paper presents a novel approach called SinTM to analyze a single Sinhala text document by combining topic modelling and keyword extraction techniques. The results were benchmarked with the well-known topic modelling algorithm Latent Dirichlet Allocation (LDA) and the SinTM was tested with prominent topic model evaluation matrices likelihood, r-squared, perplexity and coherence. We show that the SinTM can perform better results for Sinhala than the LDA. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
IEEE |
en_US |
dc.subject |
Topic Modelling |
en_US |
dc.subject |
Analytical models |
en_US |
dc.subject |
Sinhala Language |
en_US |
dc.title |
SinTM - LDA and RAKE based Topic Modelling for Sinhala Language |
en_US |
dc.type |
Article |
en_US |