Fairness in Multilingual Toxicity Detection

Paskkaran, Panuja

Fairness in Multilingual Toxicity Detection

Paskkaran, Panuja

URI: http://dlib.iit.ac.lk/xmlui/handle/123456789/3114

Date: 2025

Abstract:

As toxic and harmful content on online platforms is an ever-growing concern, the cross-lingual aspect of it makes it even graver in nature. Existing toxicity detection models are mostly English centric and tend to have limited performance across different languages, leading to biased moderation. This discrepancy results in inappropriate flagging or ignoring of the content in less-represented languages. Moreover, many toxicity detection datasets suffer from class imbalance or lack cultural context, making it challenging to produce a system that is accurate and fair across languages. To address these difficulties, we employ the XLM-RoBERTa Model an advanced transformer architecture for multi-lingual tasks. The dataset was cleaned, then augmented in two ways, back-translation in five languages (French, Italian, Turkish, Russian, Portugues) and synonym replacement using a BERT based contextual augmenter. These techniques provided additional data diversity and compensated the class imbalance. This new dataset of 100k samples was tokenized, stratified to train/ validation/ test sets, trained with Hugging Face’s Trainer API and adamw optimizer trained in mixed precision with gradient accumulation to make better use of a single GPU. After training, the fine-tuned model performed exceptionally well on tests, achieving 91.4% accuracy, 95.6% precision, 95.6% recall, and an F1-score of 95.6%. These metrics demonstrate high, consistent performance across all six toxicity categories. The combination with a multilingual transformer model yields significant improvements in fairness as well as classification performance in toxicity detection system

Show full item record