Digital Repository

A Novel Approach to Improving the Robustness of Visual Question Answering Systems

Show simple item record

dc.contributor.author Kodippily, Methmi
dc.date.accessioned 2026-03-23T06:39:16Z
dc.date.available 2026-03-23T06:39:16Z
dc.date.issued 2025
dc.identifier.citation Kodippily, Methmi (2025) A Novel Approach to Improving the Robustness of Visual Question Answering Systems. BSc. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 2019383
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/3024
dc.description.abstract VQA systems aim to interpret natural language questions about images and generate accurate responses by leveraging both visual and textual information. However, despite their growing popularity, existing VQA models face significant performance challenges due to biases inherent in their training datasets. Despite efforts to address these biases, current approaches often struggle to maintain robustness across various distribution settings. Moreover, some methods that demonstrate robustness in out-of-distribution scenarios do so at the expense of in-distribution performance. To address this challenge, this research proposes a novel architecture that combines transfer learning and debiasing techniques to create a more balanced and robust VQA model. The implementation uses the Xception network to extract high-level image features and the SBERT model to capture semantic question embeddings. These features are fused using a pointwise multiplication strategy to form a multimodal representation, which is then passed through a classifier trained on the VQA v2 dataset. A class-balanced loss function was introduced to counteract answer frequency biases by inversely weighing the loss contribution of frequent and infrequent answers. Finally, the entire system was encapsulated in a web-based application with a Flask backend and React frontend, allowing end-users to interactively upload images, pose questions, and receive model-generated answers. Extensive testing was carried out to measure how well the system performs both on familiar data (in-distribution) and new, unseen data (out-of-distribution). While the baseline model without debiasing slightly outperformed in standard accuracy, it struggled with rare question types in the OOD dataset. The debiased model, however, showed better balance across different answer types, improved accuracy for underrepresented classes, and achieved a 1.69% boost in tail accuracy on the GQA-OOD benchmark. These gains came without significantly lowering performance on common answers. Overall, the results show that combining class-balanced debiasing with pretrained models leads to a more robust and fairer VQA system. en_US
dc.language.iso en en_US
dc.subject Transfer Learning en_US
dc.subject NLP en_US
dc.subject Computer Vision en_US
dc.title A Novel Approach to Improving the Robustness of Visual Question Answering Systems en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account