Digital Repository

MedSum : A Text Summarization System to address the challenge of hallucination in long biomedical documents.

Show simple item record

dc.contributor.author Rasanayagam, Fabian
dc.date.accessioned 2025-06-06T05:35:05Z
dc.date.available 2025-06-06T05:35:05Z
dc.date.issued 2024
dc.identifier.citation Rasanayagam, Fabian (2024) MedSum : A Text Summarization System to address the challenge of hallucination in long biomedical documents.. BSc. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 20200206
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/2457
dc.description.abstract This research project presents a novel approach to tackle the challenge of hallucination in the summarization of long biomedical texts. Hallucination, a phenomenon where the summarization model generates information not present or supported by the source document, poses significant challenges to the reliability and accuracy of generated summaries. This is particularly problematic in the field of biomedical literature, where precision and factual consistency are paramount. The project leverages advanced techniques such as Contrastive Parameter Ensembling (CAPE) and Named Entity Recognition (NER) to develop a robust system for biomedical text summarization. The CAPE method, renowned for its effectiveness in reducing hallucination, is further enhanced by incorporating NER-based factual metrics. This integration aims to improve the accuracy of text summarization by refining the selection of clean and noisy subsets from the training data. The research demonstrates a notable improvement in the factual accuracy of the base model. The ensemble model, which is the result of fine-tuning the base model with the expert and anti-expert models, scored 65.13% on the BERT score, a significant increase from the base model’s 64.61%. The ROUGE-1 score, a popular metric for evaluating the quality of summaries, also saw an increase from 39.55% to 40.34%. Furthermore, the precision score, indicative of factual consistency, improved from 79.19% to 82.33%. The proposed system demonstrates enhanced performance in generating accurate and concise abstract summaries from long biomedical documents, thus addressing the challenges posed by text summarization. en_US
dc.language.iso en en_US
dc.subject Biomedical Text Summarization en_US
dc.subject Natural Language Processing en_US
dc.subject Hallucination en_US
dc.title MedSum : A Text Summarization System to address the challenge of hallucination in long biomedical documents. en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account