Predicting Heart Attack with Imbalanced Dataset Using Machine Learning Techniques

Subawickrama, Amani

Home
→
Dissertations & Thesis
→
MSc Business Analytics
→
2023
→
View Item

Predicting Heart Attack with Imbalanced Dataset Using Machine Learning Techniques

Subawickrama, Amani

URI: http://dlib.iit.ac.lk/xmlui/handle/123456789/1661

Date: 2023

Abstract:

"Heart attacks, also known as myocardial infarctions, pose a significant health risk and require accurate prediction methods to enable timely intervention and prevention. This paper presents a comprehensive study on the application of machine learning techniques for heart attack prediction. Objective: The primary aim of this study is to identify the most suitable machine learning technique for predicting heart attacks, integrating various feature selection methods, and mitigating the impact of dataset imbalance, thereby enhancing the accuracy of predictive models. Methods: In this study, analyzing heart attack classification done by collecting a substantial dataset pertaining to heart attack prediction, while selecting 10 independent variables crucial to the classification process. Comprehensively evaluate the predictive performance of eight prominent machine learning classifiers, namely J48, Random Tree, Random Forest, Naïve Bayes, REP Tree, k-NN, SVM, and Multilayer Perceptron. To enhance the classification accuracy, incorporate a range of feature selection methods, including Information Gain, Gain Ratio, Chi-Square, OneR, Wrapper, and CfsSubsetEval with Particle Swarm Optimization. Moreover, to effectively handle the class imbalance by employing a hybrid sampling approach, SMOTE and SpreadSubsample. Results: In this study, the performance of various ML classification techniques was evaluated on an original dataset with class imbalance. Surprisingly, six classifiers, J48, Random Forest, REP Tree, Naïve Bayes, Neural Network, and SVM achieved 100% accuracy, precision, recall, F-measure, and AUC. Employing CfsSubsetEval feature selection with particle swarm optimization further refined the models, reducing the training time of neural network from 2022.75s to 330.41s while maintaining perfect accuracy. Furthermore, utilizing CfsSubsetEval feature selection with particle swarm optimization on a balanced dataset enhanced accuracy for most techniques to 100%, with the exception of Naïve Bayes achieving 98.8877%. Conclusion: The study highlights the strong performance of different machine learning techniques for heart attack classification. Hence, picking the best method depends on practical factors such as available resources and ease of use, as each technique has its own strengths. A combined or further testing approach could help choose the right method for real-life situations. "

Show full item record