Prediction of Social media text-based comments toxicity during Sri Lankan  Crisis with Boosting Algorithms

Pasqual, Nethmie

Home
→
Dissertations & Thesis
→
MSc Business Analytics
→
2023
→
View Item

dc.contributor.author	Pasqual, Nethmie
dc.date.accessioned	2024-02-12T10:09:18Z
dc.date.available	2024-02-12T10:09:18Z
dc.date.issued	2023
dc.identifier.citation	Pasqual, Nethmie (2023) Prediction of Social media text-based comments toxicity during Sri Lankan Crisis with Boosting Algorithms. MSc. Dissertation, Informatics Institute of Technology	en_US
dc.identifier.issn	20211045
dc.identifier.uri	http://dlib.iit.ac.lk/xmlui/handle/123456789/1646
dc.description.abstract	Nowadays, most of the times, communication happens through social media platforms like Twitter, Facebook and Youtube. Therefore, loads of information produced on a daily basis through social media platforms and online forums. Comments are a way of expressing one's opinion about some content on those platforms. But in some cases, people tend to use these platforms to spread disrespectful, harmful thoughts about others by posting abusive content. It negatively impacts the content creators and other viewers, closing off the entire comment section for all users. These types of comments are called toxic, and they should be avoided in every possible way. There are many studies conducted for identifying the toxicity and filtering out those comments using various methodologies. The machine learning techniques are most popularly used in this context, but there is a lack of research conducted to evaluate those methodologies comparatively. This research analyzes such text-based comments made during the Sri Lankan Crisis which is still ongoing. This study discusses what classification algorithms that can be used in this problem domain and focuses only on text-based comments while analyzing three separate data sets retrieved by manually by web scraping on Facebook. Those data sets contained somewhat of a large number of comments and each and every dataset was trained and tested on six classification models considering the distribution of the dataset as well. As the feature extraction techniques, Bag-of-words (Bow) and Tf-Idf were used. Boosting Algorithms like Gradient Boosting, Catboost, XGBoost, Adaboost, Stacked Models and Hybrid Models are used as the classification algorithms to train the datasets. This study is based on a supervised machine learning approach that uses binary classification as the classification technique.	en_US
dc.language.iso	en	en_US
dc.publisher	IIT	en_US
dc.subject	Toxicity Prediction	en_US
dc.subject	Classification Models	en_US
dc.subject	Boosting Algorithms	en_US
dc.title	Prediction of Social media text-based comments toxicity during Sri Lankan Crisis with Boosting Algorithms	en_US
dc.type	Thesis	en_US