Digital Repository

DeepIssueMatch: A Token-Interaction and LLM-Based Framework for Bug Report Similarity and Triaging

Show simple item record

dc.contributor.author Herath, Dinith
dc.date.accessioned 2026-03-11T05:58:07Z
dc.date.available 2026-03-11T05:58:07Z
dc.date.issued 2025
dc.identifier.citation Herath, Dinith (2025) DeepIssueMatch: A Token-Interaction and LLM-Based Framework for Bug Report Similarity and Triaging. Msc. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 20230583
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/2923
dc.description.abstract The presence of duplicate bug reports in large-scale software repositories continues to hinder efficient triaging and resource allocation. Traditional similarity detection techniques often lack the semantic depth required to accurately identify duplicates in natural language bug descriptions. This research addresses the challenge by proposing DeepIssueMatch, a token- interaction-based framework designed to semantically retrieve and rank similar bug reports. The system was implemented as a modular pipeline consisting of a Sentence-BERT (SBERT) based semantic retriever, an optional reranker, and a lightweight language model for response formulation. Comparative evaluations were conducted using classical information retrieval models such as TF-IDF and BM25, alongside other semantic baselines including GloVe and BERT. While advanced models such as ColBERT were considered, their high computational complexity and inference overhead were found to be unsuitable for deployment in the target setting. The architecture was deployed through a FastAPI interface, and experiments were performed on a labeled HBase bug report dataset. The results demonstrated that SBERT alone achieved a Recall@10 of approximately 56%, which improved to over 61% when augmented with a reranker. Classical models such as BM25 and TF-IDF yielded Recall@10 scores around 55% and 51%, respectively, while shallow embedding-based methods remained below 30%. These findings confirm that SBERT-based retrieval provides a practical balance between performance and scalability for duplicate detection in bug triaging systems. Furthermore, fine-tuning SBERT for a specific dataset achieved even more Recall@K which is the ideal solution in real-world deployment. en_US
dc.language.iso en en_US
dc.subject Semantic Retrieval en_US
dc.subject Bug Report Deduplication en_US
dc.subject Large Language Models en_US
dc.title DeepIssueMatch: A Token-Interaction and LLM-Based Framework for Bug Report Similarity and Triaging en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account