Digital Repository

Forum Off-topic Post Detection Using Natural Language Processing

Show simple item record

dc.contributor.advisor Farook, Cassim
dc.contributor.author Abeyratne, Ramitha Ishan
dc.date.accessioned 2019-02-19T16:09:00Z
dc.date.available 2019-02-19T16:09:00Z
dc.date.issued 2018
dc.identifier.citation Abeyratne, R. M. (2018) Forum Off-topic Post Detection Using Natural Language Processing. BSc. Dissertation. Informatics Institute of Technology en_US
dc.identifier.other 2014067
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/128
dc.description.abstract A constant need to seek information is found among people who live in this fast moving world. One prominent way of meeting this demand is by using forums. People use forums to create topics, post questions, search for answers, discuss and post replies to threads. Due to the extreme growth of internet users, drastic increase of forum users were observed. A number of issues were identified when managing forums. One issue is managing off-topic posts. It is one of the most complex tasks of online forum management. Off-topic posts break the flow of knowledge stored within threads. They significantly reduce the readability of forums. Detection of off-topic posts are currently done manually. It is a very tedious and nearly impossible task when the number of threads or posts increases. This research illustrates an automated web-based solution which can be used to detect off-topic posts in online forums. Natural Language Processing is used to differentiate off-topic content from relevant content. A modified algorithm is proposed for evaluating similarity. WordNet “path” vector cosine angle semantic analysis and Dice co-efficient overlap level lexical analysis techniques are used to generate two distinct scores for each post. The final dissimilarity score is calculated by dynamically weighting the two individual scores based on the average thread word count using a regression model. The modified algorithm was compared against TF-IDF and Dice. A forum dataset obtained from Stack Exchange was used as the input. Results show that a phenomenal increase in accuracy, as high as 73.34%, was obtained. en_US
dc.subject Natural Language Processing en_US
dc.subject Information retrival en_US
dc.title Forum Off-topic Post Detection Using Natural Language Processing en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account