MedSum : A Text Summarization System to address the challenge of  hallucination in long biomedical documents.

Rasanayagam, Fabian

dc.contributor.author	Rasanayagam, Fabian
dc.date.accessioned	2025-06-06T05:35:05Z
dc.date.available	2025-06-06T05:35:05Z
dc.date.issued	2024
dc.identifier.citation	Rasanayagam, Fabian (2024) MedSum : A Text Summarization System to address the challenge of hallucination in long biomedical documents.. BSc. Dissertation, Informatics Institute of Technology	en_US
dc.identifier.issn	20200206
dc.identifier.uri	http://dlib.iit.ac.lk/xmlui/handle/123456789/2457
dc.description.abstract	This research project presents a novel approach to tackle the challenge of hallucination in the summarization of long biomedical texts. Hallucination, a phenomenon where the summarization model generates information not present or supported by the source document, poses significant challenges to the reliability and accuracy of generated summaries. This is particularly problematic in the field of biomedical literature, where precision and factual consistency are paramount. The project leverages advanced techniques such as Contrastive Parameter Ensembling (CAPE) and Named Entity Recognition (NER) to develop a robust system for biomedical text summarization. The CAPE method, renowned for its effectiveness in reducing hallucination, is further enhanced by incorporating NER-based factual metrics. This integration aims to improve the accuracy of text summarization by refining the selection of clean and noisy subsets from the training data. The research demonstrates a notable improvement in the factual accuracy of the base model. The ensemble model, which is the result of fine-tuning the base model with the expert and anti-expert models, scored 65.13% on the BERT score, a significant increase from the base model’s 64.61%. The ROUGE-1 score, a popular metric for evaluating the quality of summaries, also saw an increase from 39.55% to 40.34%. Furthermore, the precision score, indicative of factual consistency, improved from 79.19% to 82.33%. The proposed system demonstrates enhanced performance in generating accurate and concise abstract summaries from long biomedical documents, thus addressing the challenges posed by text summarization.	en_US
dc.language.iso	en	en_US
dc.subject	Biomedical Text Summarization	en_US
dc.subject	Natural Language Processing	en_US
dc.subject	Hallucination	en_US
dc.title	MedSum : A Text Summarization System to address the challenge of hallucination in long biomedical documents.	en_US
dc.type	Thesis	en_US