“Pulse Lanka” - An NLP model with Opinion Mining to Classify the Code-Mixed Code-Switched Singlish Reviews in Sri Lankan E-Commerce Platforms with Emoji Interpretation

Gunawardana, Adeesha

“Pulse Lanka” - An NLP model with Opinion Mining to Classify the Code-Mixed Code-Switched Singlish Reviews in Sri Lankan E-Commerce Platforms with Emoji Interpretation

Gunawardana, Adeesha

URI: http://dlib.iit.ac.lk/xmlui/handle/123456789/2415

Date: 2024

Abstract:

"Sri Lankan E-Commerce platforms are rapidly evolving, Understanding the sentiment through reviews has become a critical challenge, particularly due to the prevalent use of code-mixed and code-switched that combines English with Sinhalese known as Singlish. This linguistic complexity significantly hinders automated opinion mining tools, which are preliminary designed for monolingual text. As a result, businesses and researchers, alike must rely on a manual labour to decipher these reviews. This process is not only time consuming but also requires a nuanced understanding of both English and Sinhalese. The dependency on human analysis limits the scalability of the sentiment extraction and poses a significant bottleneck in harnessing the full potential of consumer feedback. This study aims to address the gap in the automated Sentiment Analysis for Singlish reviews on Sri Lankan E-Commerce platforms, highlighting the need for advanced solutions that can navigate the intricacies of code-switching and code-mixing to extract the sentiments efficiently and accurately. To tackle the challenge of analysing sentiment in Singlish reviews on Sri Lankan E-Commerce platforms, the author proposed an innovative solution that circumvents the complexities of code- mixed and code-switched language. The Core of this solution involves initially taking the raw, code-mixed code-switched reviews and transforming them into pure Sinhala Unicode through a sophisticated transliteration process. This critical step ensures that the linguistic nuances and cultural context embedded in the original reviews are preserved and made more accessible for computational analysis. Following transliteration, the Sinhala Unicode texts are then translated into English, creating a uniform dataset that is more amendable to analysis with conventional NLP techniques. By employing variety of NLP methodologies to this translated corpus, the study efficiently extracts and interprets the sentiment expressed in the original reviews. When evaluating the results, the study compared several ML models, notably Naïve Bayes and Random Forest gave the best results. Leveraging these insights, an ensemble of these two models further enhanced key performance metrics – F1 Score, Precision and Recall. This approach underscored the potential of combining ML techniques to more accurately interpret consumer sentiments."

Show full item record