Digital Repository

SPAMBRELLA : Machine learning based Sinhala spam comments detection system for YouTube

Show simple item record

dc.contributor.author R.N, De Silva,
dc.date.accessioned 2023-08-03T06:16:46Z
dc.date.available 2023-08-03T06:16:46Z
dc.date.issued 2020
dc.identifier.citation De Silva, R.N (2021) SPAMBRELLA : Machine learning based Sinhala spam comments detection system for YouTube. BEng. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 2016357
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/1603
dc.description.abstract Identifying and moderate Sinhala spam content on social media is challenging for users. YouTube has shown up as a main rival in the video sharing space.One of the most usable features of YouTube is that users can comment on others' videos. This feature permits users to collaborate with others and share their sentiments, opinions , and so on.This has become an open door for malicious users to share promotional , harmful , mis-driving substances known as spam content. Spam can be considered as harmful,misusing,cyber threat because spam has the potential of cyber security threat for end users. Detecting these spam contents is difficult, due to language dependent limitations. Therefore, the requirement for automatic identification of spam comments on social media has become of utmost importance. Simple keyword spotting procedures cannot be used to identify the exact intention of a comment. Proposed system addresses the mentioned issue by building an ensemble spam classification model with machine learning that can be used to classify spam comments in Sinhala language. This study has been able to develop different pre-processing techniques for the Sinhala sentence normalization. The unique features of the Sinhala spam comments were investigated for the feature extraction phase. With the use of different natural language techniques, the content was classified for the domains spam and non-spam. Different feature extraction techniques used and ensemble/single classifiers used for classifying the text and enhancing the performance of the system. The trained model was then able to classify racist comments with a 88.0% accuracy in experimental results.The Project evaluation was conducted along with self and expert evaluation. Therefore, the requirement for automatic identification of racist comments on social media has become of utmost importance.However, simple keyword spotting techniques cannot be used to accurately identify the exact intent of a comment. In this paper, we address this issue by building a text analytics model with machine learning that can be used to filter racist comments in Sinhala language. A Two -Class Support Vector Machine was trained with a set of carefully chosen comments from Facebook that were labelled as racist and non-racist based on en_US
dc.language.iso en en_US
dc.publisher IIT en_US
dc.title SPAMBRELLA : Machine learning based Sinhala spam comments detection system for YouTube en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account