Abstract:
People who regularly work with academic papers frequently summarize and compare the contents of research papers, but doing so is an exhausting and challenging task. Due to this problem, it may also cause depression or mental stress in students. To solve this, it is suggested to investigate a novel approach for measuring semantic similarity for scientific paper texts. This approach is developed by tuning a deep learning transformer-based model called SCIBERT for the semantic textual similarity task. The fine-tuning process is done by training the model on the SICKR-STS dataset and optimizing hyperparameters as required. The final model consists of two phases, combining cross encoder and bi encoder techniques to highlight better results than in previous work. The proposed system, SIMILARS, is evaluated on the test data of the SICKR-STS benchmark dataset to measure its performance. The performance of the model is determined by evaluating predicted similarity using the Pearson and Spearman correlation metrics. The final model has improved the Pearson correlation score from 0.65 before fine tuning to 0.91 after fine tuning. Spearman correlation score has increased from 0.61 before fine tuning to 0.84 after fine tuning.