Abstract:
Problem: For activities like information extraction, clinical decision-making, and literature mining, Biomedical Named
Entity Recognition (BioNER) is essential. Nevertheless, the majority of current BioNER systems mostly rely on surface-
form matching, which restricts their capacity to identify disease terms that are synonymous and expressed in a variety
of ways in the biomedical literature. Missed entities and decreased usefulness in downstream applications result from
this flaw. In order to improve semantic comprehension and span-level synonym retrieval, this study introduces a
collaborative learning architecture that combines BioNER with contrastive learning for synonym generalization.
Methodology: The suggested approach integrates a contrastive learning module intended to embed and retrieve disease
synonyms utilizing span-level representations with a refined BioBERT model for disease entity recognition. After
completing preparatory procedures like sentence segmentation, BIO tagging, and synonym-based data augmentation,
the model is trained on the NCBI Disease Corpus and BC5CDR, two publically accessible datasets. The model is
deployed through a Chrome plugin that does real-time disease recognition and synonym suggesting, and it is delivered
via a FastAPI backend. Metrics including precision, recall, F1-score, Recall@5, and Mean Reciprocal Rank (MRR) are
used for evaluation.
Results: The contrastive head obtained a Recall@5 of 0.5506 and an MRR of 0.5208 for synonym retrieval, but the
BioNER model obtained a precision of 76%, recall of 59%, and an F1-score of 67% for illness recognition. These
outcomes demonstrate how well the integrated strategy works to enhance synonym-aware entity recognition.
Researchers, students, and medical professionals may now more easily access biomedical material thanks to the
browser-based deployment, which also improves real-world usability.