Abstract:
"Sri Lankan E-Commerce platforms are rapidly evolving, Understanding the sentiment through
reviews has become a critical challenge, particularly due to the prevalent use of code-mixed and
code-switched that combines English with Sinhalese known as Singlish. This linguistic complexity
significantly hinders automated opinion mining tools, which are preliminary designed for
monolingual text. As a result, businesses and researchers, alike must rely on a manual labour to
decipher these reviews. This process is not only time consuming but also requires a nuanced
understanding of both English and Sinhalese. The dependency on human analysis limits the
scalability of the sentiment extraction and poses a significant bottleneck in harnessing the full
potential of consumer feedback. This study aims to address the gap in the automated Sentiment
Analysis for Singlish reviews on Sri Lankan E-Commerce platforms, highlighting the need for
advanced solutions that can navigate the intricacies of code-switching and code-mixing to extract
the sentiments efficiently and accurately.
To tackle the challenge of analysing sentiment in Singlish reviews on Sri Lankan E-Commerce
platforms, the author proposed an innovative solution that circumvents the complexities of code-
mixed and code-switched language. The Core of this solution involves initially taking the raw,
code-mixed code-switched reviews and transforming them into pure Sinhala Unicode through a
sophisticated transliteration process. This critical step ensures that the linguistic nuances and
cultural context embedded in the original reviews are preserved and made more accessible for
computational analysis. Following transliteration, the Sinhala Unicode texts are then translated
into English, creating a uniform dataset that is more amendable to analysis with conventional NLP
techniques. By employing variety of NLP methodologies to this translated corpus, the study
efficiently extracts and interprets the sentiment expressed in the original reviews.
When evaluating the results, the study compared several ML models, notably Naïve Bayes and
Random Forest gave the best results. Leveraging these insights, an ensemble of these two models
further enhanced key performance metrics – F1 Score, Precision and Recall. This approach
underscored the potential of combining ML techniques to more accurately interpret consumer
sentiments."