dc.contributor.author |
Jayasumana, Mallika Arachchige Prasad Akilendra |
|
dc.date.accessioned |
2020-07-24T18:28:11Z |
|
dc.date.available |
2020-07-24T18:28:11Z |
|
dc.date.issued |
2019 |
|
dc.identifier.citation |
Jayasumana, Mallika Arachchige Prasad Akilendra (2019) Detection of Hot Topics on Twitter using Named Entities and Event based Incremental Clustering. MSc. Dissertation Informatics Institute of Technology |
en_US |
dc.identifier.other |
2017061 |
|
dc.identifier.uri |
http://dlib.iit.ac.lk/xmlui/handle/123456789/481 |
|
dc.description.abstract |
Social media a place where most of the people spend their day to day lives in. It is a place where
people communicate and interact with each other, a place where they share their events and get
updated on current events and a place where a lot of people are active most of the time.
Many parties will largely value from identifying the current trending hot topics as it will help their
business. Marketing companies using trending hashtags when marketing their products are more
likely to be noticed. News companies will be able to find the latest news and relevant feedback for
them.
While there are many approaches to detect trending topics most of the existing systems have not
given much thought to real time performance and have failed to remove unnecessary noise in data
making them inefficient.
This research investigates on how to extract events from a twitter stream of data in real time and
display them in the form of hot topics. To achieve this an incremental event clustering approach is
taken which would be based on the named entities of the tweets. The use of pretrained Doc2Vec
generated vectors was proposed to be used for clustering the tweets into their respective events.
Additionally, the tweets will undergo a pre-processing stage where noise is removed and an event
merging process where similar tweets are merged to the same cluster.
After testing and evaluation phase, the implemented DOH framework gave a Normalised Mutual
Information score of 0.911 and a Rand Index of 0.794 after testing it on 100 labelled tweets. The
proposed methods and algorithm have proven feasible and given successful results this is justified. |
en_US |
dc.subject |
Real-time Stream Analytics |
en_US |
dc.subject |
Natural Language Processing |
en_US |
dc.subject |
Incremental Clustering |
en_US |
dc.title |
Detection of Hot Topics on Twitter using Named Entities and Event based Incremental Clustering |
en_US |
dc.type |
Thesis |
en_US |