| dc.contributor.author | Abeyratne, Ramitha | |
| dc.contributor.author | Farook, Cassim | |
| dc.date.accessioned | 2019-02-02T07:53:04Z | |
| dc.date.available | 2019-02-02T07:53:04Z | |
| dc.date.issued | 2018 | |
| dc.identifier.citation | Abeyratne, R and Farook, C (2018) A Hybrid Method for Dissimilarity Analysis between Short Text Documents. In: 2018 18th International Conference on Advances in ICT for Emerging Regions (ICTer) Colombo, Sri Lanka 26-29 Sept. 2018. IEEE, pp. 21 -26 DOI: 10.1109/ICTER.2018.8615493 | en_US |
| dc.identifier.uri | https://ieeexplore.ieee.org/document/8615493 | |
| dc.identifier.uri | http://dlib.iit.ac.lk/xmlui/handle/123456789/41 | |
| dc.description.abstract | Similarity analysis is an extremely popular aspect of Natural Language Processing (NLP). Most of the existing works focuses on analysing content of large documents. There are comparatively a smaller number of researches available which focuses on analysing similarity between short unstructured documents. This work proposes a hybrid approach which uses WordNet Path vector cosine angle analysis and Dice co-efficient overlap level analysis to determine the similarity levels of short texts. A regression model is used to dynamically weight and combine the calculated two individual scores into a single score. This hybrid approach was found to have significantly higher accuracy rates against Term Frequency Inverse Document Frequency (TF-IDF) and Dice co-efficient techniques. | en_US |
| dc.publisher | IEEE | en_US |
| dc.subject | WordNet Path vector cosine angle | en_US |
| dc.subject | Similarity analysis | en_US |
| dc.subject | Dice co-efficient overlap level analysis | en_US |
| dc.title | A Hybrid Method for Dissimilarity Analysis between Short Text Documents | en_US |
| dc.type | Article | en_US |