| dc.description.abstract |
The rapid advancement of large language models has made AI-generated text increasingly difficult to distinguish from human-written content, raising concerns around content authenticity, misuse, and trust. Existing detection systems often function as “black boxes,” offering limited interpretability and minimal insight into how predictions are made. This lack of transparency undermines user confidence, especially in domains where explainability is essential. To address these challenges, this study introduces AletheiaAI, a novel AI-generated text detection system designed not only to classify text as human- or AI-generated but also to provide clear, user-centric explanations that enhance interpretability and trust.
AletheiaAI leverages the OPT-1.3B model with Low-Rank Adaptation (LoRA), enabling efficient fine-tuning by freezing most parameters during training. Text is preprocessed through tokenization, stopword filtering, and cleaning before being transformed into embeddings for binary classification. To support explainability, the system incorporates a dual-explanation framework: LIME provides token-level visual explanations, while Mistral-7B-Instruct generates natural-language justifications to clarify the reasoning behind each prediction. The model was trained on monolingual English data from the M4 dataset, covering sources such as Wikipedia, Reddit, WikiHow, and arXiv abstracts, using a 70/20/10 train-validation-test split. Performance was evaluated using standard metrics including accuracy, precision, recall, and F1 score.
Initial results demonstrate that AletheiaAI achieves strong detection performance, with 94.49% accuracy and precision, recall, and F1 scores each at 94%. The combined use of visual and textual explanations proved effective in helping users understand model decisions, thereby strengthening transparency and trust. These findings highlight AletheiaAI’s potential as a practical, responsible solution for AI-generated text detection, particularly in contexts requiring both high accuracy and interpretability. |
en_US |