Forum Off-topic Post Detection Using Natural Language Processing

Abeyratne, Ramitha Ishan

Forum Off-topic Post Detection Using Natural Language Processing

Abeyratne, Ramitha Ishan

URI: http://dlib.iit.ac.lk/xmlui/handle/123456789/128

Date: 2018

Abstract:

A constant need to seek information is found among people who live in this fast moving world. One prominent way of meeting this demand is by using forums. People use forums to create topics, post questions, search for answers, discuss and post replies to threads. Due to the extreme growth of internet users, drastic increase of forum users were observed. A number of issues were identified when managing forums. One issue is managing off-topic posts. It is one of the most complex tasks of online forum management. Off-topic posts break the flow of knowledge stored within threads. They significantly reduce the readability of forums. Detection of off-topic posts are currently done manually. It is a very tedious and nearly impossible task when the number of threads or posts increases. This research illustrates an automated web-based solution which can be used to detect off-topic posts in online forums. Natural Language Processing is used to differentiate off-topic content from relevant content. A modified algorithm is proposed for evaluating similarity. WordNet “path” vector cosine angle semantic analysis and Dice co-efficient overlap level lexical analysis techniques are used to generate two distinct scores for each post. The final dissimilarity score is calculated by dynamically weighting the two individual scores based on the average thread word count using a regression model. The modified algorithm was compared against TF-IDF and Dice. A forum dataset obtained from Stack Exchange was used as the input. Results show that a phenomenal increase in accuracy, as high as 73.34%, was obtained.

Show full item record