Abstract:
The rapid spread of fake news across digital platforms has occurred as a significant risk to
societal trust, community safety, and political stability. Misinformation campaigns often exploit
various media formats such as text, images, and videos making it increasingly problematic to
distinguish between trustworthy and misleading news content. Traditional approaches of false
news detection primarily focus on textual analysis, leaving gaps in detecting multimodal
misinformation
The proposed system has used advanced multimodal models CLIP (OpenAI), ViLT, BERT,
ResNet50, and VisualBERT to evaluate. Text is processed using BERT, while ResNet50 and
SAFE handle image features. ViLT and VisualBERT model text-image relationships, and CLIP
aligns visual and textual semantics. After evaluating all models, the best-performing ones are
combined to improve accuracy and generalization. Explainability is ensured through SHAP or
LIME, helping users understand the reasoning behind each prediction.
This approach is demonstrated through a prototype, assessed using standard metrics like
accuracy, recall, precision, and F1-score. The initial model, without zero-shot learning,
achieved strong performance on the Twitter dataset with an accuracy of 97.61%, precision of
97.62%, recall of 97.61%, and F1-score of 97.61%. When evaluated with zero-shot learning
on the FakeNewsNet dataset, the enhanced model achieved 93.91% accuracy. The proposed
solution promises to be an effective tool in combating misinformation by providing a robust,
explainable system for fake news detection across diverse media formats. Future work includes
expanding the model’s capabilities to handle real-time data and multimedia content, further
improving the model's efficiency and adaptability in dynamic environments.