Digital Repository

Similarity analysis of short documents VIA NLP, DNN ecoder-decoder mechanism and LSTM

Show simple item record

dc.contributor.author Abeyratne, Ramitha Ishan
dc.date.accessioned 2022-02-25T09:32:32Z
dc.date.available 2022-02-25T09:32:32Z
dc.date.issued 2021
dc.identifier.citation Abeyratne, Ramitha Ishan (2021) Similarity analysis of short documents VIA NLP, DNN ecoder-decoder mechanism and LSTM. MSc. Dissertation Informatics Institute of Technology en_US
dc.identifier.issn 2018592
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/776
dc.description.abstract Humans rely on information deeply in this competitive modern era. A common and prominent method of seeking information is by using forums. People express their opinions and gather answers to questions using them. Off-topic posts are commonly found with forums. They reduce the readability and user experience for users. Therefore, it is important to detect off topics posts to manage it. The identification is a tedious task as typical forums contain a significant amount of content. Due to the increase of internet users during the recent years, capturing off-topic posts had become an even more challenging task. This research demonstrates an automated solution to identify off-topic posts in a forum. The core of the solution was compiled using techniques sourced from Natural Language Processing and Deep Neural Networking. The similarity analysis method uses document representation through WordNet Path vectors and cosine angle difference calculation. This method is computationally expensive. Therefore, a word count reduction model was introduced to decrease the words pushed for the analysis. It was created using the Long Short Term Memory encoder-decoder mechanism. The reduction model was trained on a food domain dataset and the entire prototype was tested exhaustively on a similar domain, forum dataset. A 71.36% accuracy was recorded with the vanilla similarity analysis mechanism while a 67.95% accuracy was recorded with the reduction enabled model. A significant performance increase was captured when the dataset was sent through reduction before similarity analysis. Subject Descriptors 1.2: Artificial Intelligence 1.2.6: Learning 1.2.7: Natural Language Processing H.3 Information Storage and Retrieval H.3.3 Information Search and Retrieval en_US
dc.language.iso en en_US
dc.subject Attention en_US
dc.subject Encoder-decoder en_US
dc.subject LSTM en_US
dc.subject Deep Neural Networks en_US
dc.subject WordNet path en_US
dc.subject Cosine angle en_US
dc.subject Vector en_US
dc.subject Semantic similarity en_US
dc.subject Natural Language Processing en_US
dc.title Similarity analysis of short documents VIA NLP, DNN ecoder-decoder mechanism and LSTM en_US
dc.type Thesis en_US


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account