Abstract:
"This study investigates the detection of cyberbullying in Romanized Sinhala using various
machine learning classifiers and feature extraction methods. The primary objective is to
identify the most effective combination of classifier and feature extraction techniques for this
task. We employ rule-based, Bag-of-Words (BoW), and Term Frequency-Inverse Document
Frequency (TF-IDF) feature extraction methods, as well as additional features such as word
count and gender. The classifiers studied include K-Nearest Neighbours (KNN), Voting,
Random Forest, Support Vector Machines (SVM), Decision Tree, Naive Bayes, Multilayer
Perceptron (MLP), AdaBoost, and Logistic Regression."