A Hybrid Method for Dissimilarity Analysis between Short Text Documents

Abeyratne, Ramitha; Farook, Cassim

A Hybrid Method for Dissimilarity Analysis between Short Text Documents

Abeyratne, Ramitha; Farook, Cassim

URI: https://ieeexplore.ieee.org/document/8615493
http://dlib.iit.ac.lk/xmlui/handle/123456789/41

Date: 2018

Abstract:

Similarity analysis is an extremely popular aspect of Natural Language Processing (NLP). Most of the existing works focuses on analysing content of large documents. There are comparatively a smaller number of researches available which focuses on analysing similarity between short unstructured documents. This work proposes a hybrid approach which uses WordNet Path vector cosine angle analysis and Dice co-efficient overlap level analysis to determine the similarity levels of short texts. A regression model is used to dynamically weight and combine the calculated two individual scores into a single score. This hybrid approach was found to have significantly higher accuracy rates against Term Frequency Inverse Document Frequency (TF-IDF) and Dice co-efficient techniques.

Show full item record