Abstract:
Artificial Intelligence (AI) is a broad concept that occupies a significant part of modern technology, especially through Natural Language Processing (NLP) models. However, these models repeatedly inherit societal biases, leading to unfair or unethical outcomes. Many existing bias detection methods struggle to identify subtle, context-specific biases, making it complex for users to interpret and address them. Additionally, the lack of clear explanations in bias detection reduces trust and usability, particularly in important decision-making situations.
To address these issues, this research presents Bias Lens, a framework that improves bias detection by using Explainable AI (XAI) techniques. Bias Lens applies multi-label token classification to detect different types of bias in a detailed manner. It also uses Integrated Gap Gradients (IG2) and other advanced XAI methods to explain why certain text is considered biased. Interactive visualizations help users, such as AI auditors and content moderators, explore and analyze bias patterns more effectively.
The Bias Lens model was tested and achieved 93% accuracy with optimized settings. Since similar XAI-based approaches are rare, direct comparisons were limited, but qualitative feedback exceeded expectations. The use of XAI techniques successfully highlighted biased words through attribution scores, improving transparency in bias detection. Expert evaluations confirmed that Bias Lens effectively tackles key challenges, making it a significant step forward in Ethical AI and fair NLP models.