Disease Synonym Generalization in BioNER

Sivanesathurai, Tharagan

Disease Synonym Generalization in BioNER

Sivanesathurai, Tharagan

URI: http://dlib.iit.ac.lk/xmlui/handle/123456789/3157

Date: 2025

Abstract:

Problem: For activities like information extraction, clinical decision-making, and literature mining, Biomedical Named Entity Recognition (BioNER) is essential. Nevertheless, the majority of current BioNER systems mostly rely on surface- form matching, which restricts their capacity to identify disease terms that are synonymous and expressed in a variety of ways in the biomedical literature. Missed entities and decreased usefulness in downstream applications result from this flaw. In order to improve semantic comprehension and span-level synonym retrieval, this study introduces a collaborative learning architecture that combines BioNER with contrastive learning for synonym generalization. Methodology: The suggested approach integrates a contrastive learning module intended to embed and retrieve disease synonyms utilizing span-level representations with a refined BioBERT model for disease entity recognition. After completing preparatory procedures like sentence segmentation, BIO tagging, and synonym-based data augmentation, the model is trained on the NCBI Disease Corpus and BC5CDR, two publically accessible datasets. The model is deployed through a Chrome plugin that does real-time disease recognition and synonym suggesting, and it is delivered via a FastAPI backend. Metrics including precision, recall, F1-score, Recall@5, and Mean Reciprocal Rank (MRR) are used for evaluation. Results: The contrastive head obtained a Recall@5 of 0.5506 and an MRR of 0.5208 for synonym retrieval, but the BioNER model obtained a precision of 76%, recall of 59%, and an F1-score of 67% for illness recognition. These outcomes demonstrate how well the integrated strategy works to enhance synonym-aware entity recognition. Researchers, students, and medical professionals may now more easily access biomedical material thanks to the browser-based deployment, which also improves real-world usability.

Show full item record