Abstract:
This study aimed to enhance the performance of the glossBERT model, a leading Word Sense Disambiguation (WSD) approach, by incorporating Named Entity Recognition (NER) and synonyms into the training phase. Utilizing a compact version of the semCor dataset—due to resource constraints—two versions of the dataset were prepared. Both approaches were evaluated using the SemEval 2007, SemEval 2013 and the SemEval 2015 datasets due to their comprehensive and structured approach to semantic analysis and evaluation. The first retained its original form for training the conventional glossBERT model, which acted as the baseline, achieving an average F1 score of 55.21%. The second dataset was incorporated with NER tags from spaCy and synonyms from the Natural Language Toolkit (NLTK), and the model was accordingly modified to integrate these features. This novel approach led to an improved average F1 score of 59.84% in the enhanced glossBERT model. The results suggest that integrating NER and synonyms can significantly improve the model's ability to disambiguate word senses. Although limited by the dataset's size due to resource limitations the author faced, it can be argued that the modified glossBERT model's performance indicates a potential for even greater accuracy improvements with access to more extensive training resources. This study highlights the importance of incorporating contextual clues like NER and lexical variety through synonyms in improving WSD models.