Abstract:
This research project presents a novel approach to tackle the challenge of hallucination in the summarization of long biomedical texts. Hallucination, a phenomenon where the summarization model generates information not present or supported by the source document, poses significant challenges to the reliability and accuracy of generated summaries. This is particularly problematic in the field of biomedical literature, where precision and factual consistency are paramount. The project leverages advanced techniques such as Contrastive Parameter Ensembling (CAPE) and Named Entity Recognition (NER) to develop a robust system for biomedical text summarization. The CAPE method, renowned for its effectiveness in reducing hallucination, is further enhanced by incorporating NER-based factual metrics. This integration aims to improve the accuracy of text summarization by refining the selection of clean and noisy subsets from the training data. The research demonstrates a notable improvement in the factual accuracy of the base model. The ensemble model, which is the result of fine-tuning the base model with the expert and anti-expert models, scored 65.13% on the BERT score, a significant increase from the base model’s 64.61%. The ROUGE-1 score, a popular metric for evaluating the quality of summaries, also saw an increase from 39.55% to 40.34%. Furthermore, the precision score, indicative of factual consistency, improved from 79.19% to 82.33%. The proposed system demonstrates enhanced performance in generating accurate and concise abstract summaries from long biomedical documents, thus addressing the challenges posed by text summarization.