Digital Repository

Detection of Hot Topics on Twitter using Named Entities and Event based Incremental Clustering

Show simple item record

dc.contributor.author Jayasumana, Mallika Arachchige Prasad Akilendra
dc.date.accessioned 2020-07-24T18:28:11Z
dc.date.available 2020-07-24T18:28:11Z
dc.date.issued 2019
dc.identifier.citation Jayasumana, Mallika Arachchige Prasad Akilendra (2019) Detection of Hot Topics on Twitter using Named Entities and Event based Incremental Clustering. MSc. Dissertation Informatics Institute of Technology en_US
dc.identifier.other 2017061
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/481
dc.description.abstract Social media a place where most of the people spend their day to day lives in. It is a place where people communicate and interact with each other, a place where they share their events and get updated on current events and a place where a lot of people are active most of the time. Many parties will largely value from identifying the current trending hot topics as it will help their business. Marketing companies using trending hashtags when marketing their products are more likely to be noticed. News companies will be able to find the latest news and relevant feedback for them. While there are many approaches to detect trending topics most of the existing systems have not given much thought to real time performance and have failed to remove unnecessary noise in data making them inefficient. This research investigates on how to extract events from a twitter stream of data in real time and display them in the form of hot topics. To achieve this an incremental event clustering approach is taken which would be based on the named entities of the tweets. The use of pretrained Doc2Vec generated vectors was proposed to be used for clustering the tweets into their respective events. Additionally, the tweets will undergo a pre-processing stage where noise is removed and an event merging process where similar tweets are merged to the same cluster. After testing and evaluation phase, the implemented DOH framework gave a Normalised Mutual Information score of 0.911 and a Rand Index of 0.794 after testing it on 100 labelled tweets. The proposed methods and algorithm have proven feasible and given successful results this is justified. en_US
dc.subject Real-time Stream Analytics en_US
dc.subject Natural Language Processing en_US
dc.subject Incremental Clustering en_US
dc.title Detection of Hot Topics on Twitter using Named Entities and Event based Incremental Clustering en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account