Digital Repository

A Hybrid Method for Dissimilarity Analysis between Short Text Documents

Show simple item record

dc.contributor.author Abeyratne, Ramitha
dc.contributor.author Farook, Cassim
dc.date.accessioned 2019-02-02T07:53:04Z
dc.date.available 2019-02-02T07:53:04Z
dc.date.issued 2018
dc.identifier.citation Abeyratne, R and Farook, C (2018) A Hybrid Method for Dissimilarity Analysis between Short Text Documents. In: 2018 18th International Conference on Advances in ICT for Emerging Regions (ICTer) Colombo, Sri Lanka 26-29 Sept. 2018. IEEE, pp. 21 -26 DOI: 10.1109/ICTER.2018.8615493 en_US
dc.identifier.uri https://ieeexplore.ieee.org/document/8615493
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/41
dc.description.abstract Similarity analysis is an extremely popular aspect of Natural Language Processing (NLP). Most of the existing works focuses on analysing content of large documents. There are comparatively a smaller number of researches available which focuses on analysing similarity between short unstructured documents. This work proposes a hybrid approach which uses WordNet Path vector cosine angle analysis and Dice co-efficient overlap level analysis to determine the similarity levels of short texts. A regression model is used to dynamically weight and combine the calculated two individual scores into a single score. This hybrid approach was found to have significantly higher accuracy rates against Term Frequency Inverse Document Frequency (TF-IDF) and Dice co-efficient techniques. en_US
dc.publisher IEEE en_US
dc.subject WordNet Path vector cosine angle en_US
dc.subject Similarity analysis en_US
dc.subject Dice co-efficient overlap level analysis en_US
dc.title A Hybrid Method for Dissimilarity Analysis between Short Text Documents en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account