Digital Repository

Identifying Hate Speech in Romanized Sinhala in Social Media Comments Using NLP

Show simple item record

dc.contributor.author Munasinghe, Sandaru
dc.date.accessioned 2022-12-20T05:09:08Z
dc.date.available 2022-12-20T05:09:08Z
dc.date.issued 2022
dc.identifier.citation Munasinghe, Sandaru (2022) Identifying Hate Speech in Romanized Sinhala in Social Media Comments Using NLP. BEng. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 2018161
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/1188
dc.description.abstract "In today's society, the use of harsh language on social media sites is becoming a big concern. The rise of hate speech, in general, is dangerous as it can be threatening, racially motivated, or even ethnically targeted. The use of hate speech is mostly done through social media than it has been said in person. The reason for this is those who have confidence in speaking freely through social media. With confidence, the users that write hate speech as comments are comfortable with doing so, therefore, this causes a flame war in comments with hate speech. The detection of hate speech is very important as this can build a safer environment within social media and adapt to being less influential in using hate speech. Hate speech is multilingual and but most of which is written in Romanized language due to the convenient adaptation of doing so with modern devices than writing in the language. However, considering that a majority writes in Romanized languages, the author chooses to work with Romanized Sinhala to detect hate speech. Currently, much of the research has so far focused on solving it in English. This study aimed to identify hate speech published in Romanized Sinhala and define the gap within the research that is being done. The main goal of this study is to automatically detect hate content in comments on social media published in Romanized Sinhala using a manually labeled collection of data and NLP technologies. With the use of Deep Learning techniques, a high training accuracy is generated along with a high validation accuracy. BERT text classification is used to ensure success with a training accuracy of 91.99% and a validation accuracy of 95.38%. This proposed solution covers the gap identified which is obtaining high training and accuracy values over the existing hate speech detection system. " en_US
dc.language.iso en en_US
dc.subject Romanized Sinhala hate speech detection en_US
dc.subject Natural language processing en_US
dc.subject Text Classifications en_US
dc.title Identifying Hate Speech in Romanized Sinhala in Social Media Comments Using NLP en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account