| dc.contributor.author | Jayasumana, Mallika Arachchige Prasad Akilendra | |
| dc.date.accessioned | 2020-07-24T18:28:11Z | |
| dc.date.available | 2020-07-24T18:28:11Z | |
| dc.date.issued | 2019 | |
| dc.identifier.citation | Jayasumana, Mallika Arachchige Prasad Akilendra (2019) Detection of Hot Topics on Twitter using Named Entities and Event based Incremental Clustering. MSc. Dissertation Informatics Institute of Technology | en_US |
| dc.identifier.other | 2017061 | |
| dc.identifier.uri | http://dlib.iit.ac.lk/xmlui/handle/123456789/481 | |
| dc.description.abstract | Social media a place where most of the people spend their day to day lives in. It is a place where people communicate and interact with each other, a place where they share their events and get updated on current events and a place where a lot of people are active most of the time. Many parties will largely value from identifying the current trending hot topics as it will help their business. Marketing companies using trending hashtags when marketing their products are more likely to be noticed. News companies will be able to find the latest news and relevant feedback for them. While there are many approaches to detect trending topics most of the existing systems have not given much thought to real time performance and have failed to remove unnecessary noise in data making them inefficient. This research investigates on how to extract events from a twitter stream of data in real time and display them in the form of hot topics. To achieve this an incremental event clustering approach is taken which would be based on the named entities of the tweets. The use of pretrained Doc2Vec generated vectors was proposed to be used for clustering the tweets into their respective events. Additionally, the tweets will undergo a pre-processing stage where noise is removed and an event merging process where similar tweets are merged to the same cluster. After testing and evaluation phase, the implemented DOH framework gave a Normalised Mutual Information score of 0.911 and a Rand Index of 0.794 after testing it on 100 labelled tweets. The proposed methods and algorithm have proven feasible and given successful results this is justified. | en_US |
| dc.subject | Real-time Stream Analytics | en_US |
| dc.subject | Natural Language Processing | en_US |
| dc.subject | Incremental Clustering | en_US |
| dc.title | Detection of Hot Topics on Twitter using Named Entities and Event based Incremental Clustering | en_US |
| dc.type | Thesis | en_US |