Abstract:
"In the contemporary digital age, the proliferation of social media platforms like Facebook and
YouTube has facilitated unprecedented levels of interaction and content sharing among
individuals worldwide. However, alongside these advancements, there's a pressing concern
regarding the rise of harmful and toxic comments, particularly hate speech, which can
adversely impact users, content creators, and businesses. This study focuses on the specific
context of Sri Lanka, a multicultural and multilingual developing nation where Facebook is a
primary medium for sharing thoughts and ideas. With approximately 4.5 million active users
among a population of 20.76 million, understanding and addressing the prevalence of hate
speech in Sinhala and Singlish comments on social media platforms is crucial.
The methodology employed in this research involves a comprehensive assessment of user
comments and feedback on various social media platforms, particularly focusing on
determining the presence of hate speech elements. Natural Language Processing (NLP)
techniques, including sentiment analysis, topic modeling, and classification algorithms, are
utilized to analyze the linguistic patterns and contextual nuances of the comments.
Additionally, machine learning models are trained on annotated datasets to classify comments
as either hate speech or non-hate speech, thereby automating the detection process.
Preliminary results reveal significant insights into the prevalence and characteristics of hate
speech in Sinhala and Singlish comments on social media platforms. Using a classification
model, an initial accuracy rate of 85% was achieved in identifying hate speech instances.
Further evaluation metrics, such as precision, recall, and F1 score, provide a comprehensive
understanding of the model's performance. These findings contribute to the development of
effective strategies for mitigating hate speech and fostering a safer online environment in
multicultural digital communities."