Abstract:
"Hate speech is a growing issue in today's culture, and being able to identify it quickly could help lessen its negative effects. In our current society, people use social media to express their thoughts without any doubt. People try to take revenge on social media. Especially in the YouTube video comment section, there are so many hate comments. Most people use their native language to type using English characters.
The study investigates the use of a deep learning technique called BERT for identifying hate speech in code-mixed Sinhala and English. This study intends to solve the difficulties of recognizing hate speech in code-mixed languages, which are common in multilingual nations like Sri Lanka. Hate speech identification is a crucial problem in natural language processing.
The paper compares different machine learning models with the performance of BERT models that were trained on a dataset of hate speech in a code-mixed Sinhala-English language. The results show that BERT models, with an accuracy of 92.3%, outperform other models in identifying hate speech in code-mixed languages. The research advances the creation of tools for the detection of hate speech in multilingual settings and shows the potential of deep learning techniques in this field."