Abstract:
"The internet has a lot of information, which can be overwhelming. It can be hard to keep up with all of it. Article summarization can help with this problem. It involves condensing long articles into shorter ones, while keeping the important information.
Most article summarization methods use supervised learning algorithms, which need labeled data. Labeled data is data that has been tagged with the correct answer. It can be hard to get labeled data for article summarization, so this project proposes an unsupervised hybrid approach.
This approach uses K-means clustering and Latent Semantic Analysis (LSA) to summarize articles. K-means clustering groups similar sentences together, and LSA extracts important topics. The approach then scores the sentences based on their similarity to the topics, relevance to the article's keywords, and other factors. The top-scoring sentences are then selected to form the summary.
The researchers tested the approach on various articles and compared it to existing supervised approaches. The unsupervised approach outperformed the supervised approaches in terms of Rouge scores, which is a standard metric for evaluating summarization techniques. The unsupervised approach can also be used on any new dataset without requiring labeled data."