Similarity analysis of short documents VIA NLP, DNN ecoder-decoder mechanism and LSTM

Abeyratne, Ramitha Ishan

Home
→
Dissertations & Thesis
→
MSc Bigdata Analytics
→
2021
→
View Item

Similarity analysis of short documents VIA NLP, DNN ecoder-decoder mechanism and LSTM

Abeyratne, Ramitha Ishan

URI: http://dlib.iit.ac.lk/xmlui/handle/123456789/776

Date: 2021

Abstract:

Humans rely on information deeply in this competitive modern era. A common and prominent method of seeking information is by using forums. People express their opinions and gather answers to questions using them. Off-topic posts are commonly found with forums. They reduce the readability and user experience for users. Therefore, it is important to detect off topics posts to manage it. The identification is a tedious task as typical forums contain a significant amount of content. Due to the increase of internet users during the recent years, capturing off-topic posts had become an even more challenging task. This research demonstrates an automated solution to identify off-topic posts in a forum. The core of the solution was compiled using techniques sourced from Natural Language Processing and Deep Neural Networking. The similarity analysis method uses document representation through WordNet Path vectors and cosine angle difference calculation. This method is computationally expensive. Therefore, a word count reduction model was introduced to decrease the words pushed for the analysis. It was created using the Long Short Term Memory encoder-decoder mechanism. The reduction model was trained on a food domain dataset and the entire prototype was tested exhaustively on a similar domain, forum dataset. A 71.36% accuracy was recorded with the vanilla similarity analysis mechanism while a 67.95% accuracy was recorded with the reduction enabled model. A significant performance increase was captured when the dataset was sent through reduction before similarity analysis. Subject Descriptors 1.2: Artificial Intelligence 1.2.6: Learning 1.2.7: Natural Language Processing H.3 Information Storage and Retrieval H.3.3 Information Search and Retrieval

Show full item record

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

2021

Search

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

Similarity analysis of short documents VIA NLP, DNN ecoder-decoder mechanism and LSTM

Similarity analysis of short documents VIA NLP, DNN ecoder-decoder mechanism and LSTM

Abstract:

Files in this item

This item appears in the following Collection(s)

Search

Browse

All of DSpace

This Collection

My Account