dc.description.abstract |
Nowadays, most of the times, communication happens through social media platforms
like Twitter, Facebook and Youtube. Therefore, loads of information produced on a
daily basis through social media platforms and online forums. Comments are a way of
expressing one's opinion about some content on those platforms. But in some cases,
people tend to use these platforms to spread disrespectful, harmful thoughts about
others by posting abusive content. It negatively impacts the content creators and other
viewers, closing off the entire comment section for all users. These types of comments
are called toxic, and they should be avoided in every possible way. There are many
studies conducted for identifying the toxicity and filtering out those comments using
various methodologies. The machine learning techniques are most popularly used in
this context, but there is a lack of research conducted to evaluate those methodologies
comparatively. This research analyzes such text-based comments made during the Sri
Lankan Crisis which is still ongoing.
This study discusses what classification algorithms that can be used in this problem
domain and focuses only on text-based comments while analyzing three separate data
sets retrieved by manually by web scraping on Facebook. Those data sets contained
somewhat of a large number of comments and each and every dataset was trained and
tested on six classification models considering the distribution of the dataset as well.
As the feature extraction techniques, Bag-of-words (Bow) and Tf-Idf were used.
Boosting Algorithms like Gradient Boosting, Catboost, XGBoost, Adaboost, Stacked
Models and Hybrid Models are used as the classification algorithms to train the
datasets. This study is based on a supervised machine learning approach that uses binary
classification as the classification technique. |
en_US |