Digital Repository

Hate Speech Detection in Sinhala-English Code-Mixed Language

Show simple item record

dc.contributor.author Liyanage, Oshadhi
dc.contributor.author Jayakumar, Krishnakripa
dc.date.accessioned 2025-04-29T06:20:06Z
dc.date.available 2025-04-29T06:20:06Z
dc.date.issued 2021
dc.identifier.citation Liyanage, O. and Jayakumar, K. (2021) ‘Hate Speech Detection in Sinhala-English Code-Mixed Language’, in 2021 21st International Conference on Advances in ICT for Emerging Regions (ICter). 2021 21st International Conference on Advances in ICT for Emerging Regions (ICter), pp. 225–230. Available at: https://doi.org/10.1109/ICter53630.2021.9774816. en_US
dc.identifier.uri https://ieeexplore.ieee.org/document/9774816
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/2291
dc.description.abstract With the steady increase of user-generated content on the internet, the amount of hate content on the internet is also being rapidly increased. Social media sites, review forums, microblogging sites encourage users to convey their thoughts with minimum restrictions. This leads to expressing hate towards others who do not believe their beliefs. This study focuses on identifying hate speech texts that are written in Sinhala-English code-mixed language (Singlish) which is mostly used by Sri Lankans on the internet. Due to the unavailability of Sinhala-English code-mixed datasets, the dataset was created using comments on YouTube and Facebook. In this research, eight machine learning algorithms and three ensemble approaches were evaluated to detect hate speech in Singlish. Furthermore, their accuracy, precision, recall, and f1-score were evaluated. Afterwards, based on the performance of the considered algorithms, Support Vector Machine (SVM), Multinominal Naïve Bayes (MNB), AdaBoost Classifier, and Logistic Regression classifiers were used to develop ensemble learning-based solutions. In terms of ensemble learning approaches, soft voting, hard voting, and stacking were evaluated. The hard voting approach outperformed other baseline algorithms and ensemble approaches with 84% accuracy and f1-score. en_US
dc.language.iso en en_US
dc.publisher IEEE en_US
dc.subject Ensemble Learning en_US
dc.subject Hate Speech detection en_US
dc.subject Support Vector Machine en_US
dc.subject Hate speech en_US
dc.title Hate Speech Detection in Sinhala-English Code-Mixed Language en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account