dc.contributor.author |
Abeyratne, Ramitha |
|
dc.contributor.author |
Farook, Cassim |
|
dc.date.accessioned |
2019-02-02T07:53:04Z |
|
dc.date.available |
2019-02-02T07:53:04Z |
|
dc.date.issued |
2018 |
|
dc.identifier.citation |
Abeyratne, R and Farook, C (2018) A Hybrid Method for Dissimilarity Analysis between Short Text Documents. In: 2018 18th International Conference on Advances in ICT for Emerging Regions (ICTer) Colombo, Sri Lanka 26-29 Sept. 2018. IEEE, pp. 21 -26 DOI: 10.1109/ICTER.2018.8615493 |
en_US |
dc.identifier.uri |
https://ieeexplore.ieee.org/document/8615493 |
|
dc.identifier.uri |
http://dlib.iit.ac.lk/xmlui/handle/123456789/41 |
|
dc.description.abstract |
Similarity analysis is an extremely popular aspect of Natural Language Processing (NLP). Most of the existing works focuses on analysing content of large documents. There are comparatively a smaller number of researches available which focuses on analysing similarity between short unstructured documents. This work proposes a hybrid approach which uses WordNet Path vector cosine angle analysis and Dice co-efficient overlap level analysis to determine the similarity levels of short texts. A regression model is used to dynamically weight and combine the calculated two individual scores into a single score. This hybrid approach was found to have significantly higher accuracy rates against Term Frequency Inverse Document Frequency (TF-IDF) and Dice co-efficient techniques. |
en_US |
dc.publisher |
IEEE |
en_US |
dc.subject |
WordNet Path vector cosine angle |
en_US |
dc.subject |
Similarity analysis |
en_US |
dc.subject |
Dice co-efficient overlap level analysis |
en_US |
dc.title |
A Hybrid Method for Dissimilarity Analysis between Short Text Documents |
en_US |
dc.type |
Article |
en_US |