Digital Repository

Prediction of Social media text-based comments toxicity during Sri Lankan Crisis with Boosting Algorithms

Show simple item record

dc.contributor.author Pasqual, Nethmie
dc.date.accessioned 2024-02-12T10:09:18Z
dc.date.available 2024-02-12T10:09:18Z
dc.date.issued 2023
dc.identifier.citation Pasqual, Nethmie (2023) Prediction of Social media text-based comments toxicity during Sri Lankan Crisis with Boosting Algorithms. MSc. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 20211045
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/1646
dc.description.abstract Nowadays, most of the times, communication happens through social media platforms like Twitter, Facebook and Youtube. Therefore, loads of information produced on a daily basis through social media platforms and online forums. Comments are a way of expressing one's opinion about some content on those platforms. But in some cases, people tend to use these platforms to spread disrespectful, harmful thoughts about others by posting abusive content. It negatively impacts the content creators and other viewers, closing off the entire comment section for all users. These types of comments are called toxic, and they should be avoided in every possible way. There are many studies conducted for identifying the toxicity and filtering out those comments using various methodologies. The machine learning techniques are most popularly used in this context, but there is a lack of research conducted to evaluate those methodologies comparatively. This research analyzes such text-based comments made during the Sri Lankan Crisis which is still ongoing. This study discusses what classification algorithms that can be used in this problem domain and focuses only on text-based comments while analyzing three separate data sets retrieved by manually by web scraping on Facebook. Those data sets contained somewhat of a large number of comments and each and every dataset was trained and tested on six classification models considering the distribution of the dataset as well. As the feature extraction techniques, Bag-of-words (Bow) and Tf-Idf were used. Boosting Algorithms like Gradient Boosting, Catboost, XGBoost, Adaboost, Stacked Models and Hybrid Models are used as the classification algorithms to train the datasets. This study is based on a supervised machine learning approach that uses binary classification as the classification technique. en_US
dc.language.iso en en_US
dc.publisher IIT en_US
dc.subject Toxicity Prediction en_US
dc.subject Classification Models en_US
dc.subject Boosting Algorithms en_US
dc.title Prediction of Social media text-based comments toxicity during Sri Lankan Crisis with Boosting Algorithms en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account