A Comparative Evaluation of Machine Learning  Models for Lung Cancer Stage Prediction in  Smokers

Liyanage, Dinithi

Home
→
Dissertations & Thesis
→
MSc Business Analytics
→
2024
→
View Item

dc.contributor.author	Liyanage, Dinithi
dc.date.accessioned	2025-07-01T10:25:59Z
dc.date.available	2025-07-01T10:25:59Z
dc.date.issued	2024
dc.identifier.citation	Liyanage, Dinithi (2024) A Comparative Evaluation of Machine Learning Models for Lung Cancer Stage Prediction in Smokers. MSc. Dissertation, Informatics Institute of Technology	en_US
dc.identifier.issn	20221760
dc.identifier.uri	http://dlib.iit.ac.lk/xmlui/handle/123456789/2834
dc.description.abstract	"By incorporating cancer stage into predictive models, the study aims to identify the most effective models or combinations of models, offering clearer insights into their decision-making processes and providing a robust tool for personalized treatment and better patient outcomes. In this study, designs predictive models for lung cancer stage classification among smokers using ML techniques. The approach involves selecting an appropriate patient dataset and applying various ML algorithms to build and evaluate predictive models. Comparing different models, the Naive Bayes classifier was found to be the best in the classification task. It had a very good mean cross validation score of 98.40%, and its mean score was very consistent across different splits of data. Further, with test data accuracy of 98.48%, it showed very good skill in correctly classifying cases. Both precision and recall scores are 98.48%, which speaks well for the efficiency of the model in identifying true positives while at the same time minimizing both false positives and false negatives. Its ROC AUC of 99.36% speaks to a very good capability in classifying stages of cancer. While feature importance is not available for this classifier, it can be said from the confusion matrix that the Naive Bayes model has done minimal misclassification in the classification task. In the end, the balanced performance metrics with high accuracy make the Naive Bayes classifier robust and reliable for this particular classification problem. The main target is to develop a proper ML model that can accurately classify or predicts the stage of a lung cancer depend on patient features or specifically status of the smoking, with a focus on improving diagnostic accuracy and providing insights for better treatment decisions. This project was limited in some aspects, given the difficulties in collecting and analysing accurate data. The limited amount of extensive research in the area and a good-quality dataset further constrained this analysis. Models implemented here need to be tested on a more accurate dataset and reviewed by specialists in the area of lung cancer to see how practically reliable they are. Besides, technical and programming difficulties limit improvements to the project. Based on this, further improvements can be made in the future, it is necessary to make more comprehensive data acquisition, improve the accuracy in conjunction with experts in the industry, and make the model more practical."	en_US
dc.language.iso	en	en_US
dc.subject	Machine Learning	en_US
dc.subject	Gradient Boosting	en_US
dc.subject	Prediction	en_US
dc.subject	Lung Cancer	en_US
dc.subject	Smoking	en_US
dc.title	A Comparative Evaluation of Machine Learning Models for Lung Cancer Stage Prediction in Smokers	en_US
dc.type	Thesis	en_US