Digital Repository

A Comparative Evaluation of Machine Learning Models for Lung Cancer Stage Prediction in Smokers

Show simple item record

dc.contributor.author Liyanage, Dinithi
dc.date.accessioned 2025-07-01T10:25:59Z
dc.date.available 2025-07-01T10:25:59Z
dc.date.issued 2024
dc.identifier.citation Liyanage, Dinithi (2024) A Comparative Evaluation of Machine Learning Models for Lung Cancer Stage Prediction in Smokers. MSc. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 20221760
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/2834
dc.description.abstract "By incorporating cancer stage into predictive models, the study aims to identify the most effective models or combinations of models, offering clearer insights into their decision-making processes and providing a robust tool for personalized treatment and better patient outcomes. In this study, designs predictive models for lung cancer stage classification among smokers using ML techniques. The approach involves selecting an appropriate patient dataset and applying various ML algorithms to build and evaluate predictive models. Comparing different models, the Naive Bayes classifier was found to be the best in the classification task. It had a very good mean cross validation score of 98.40%, and its mean score was very consistent across different splits of data. Further, with test data accuracy of 98.48%, it showed very good skill in correctly classifying cases. Both precision and recall scores are 98.48%, which speaks well for the efficiency of the model in identifying true positives while at the same time minimizing both false positives and false negatives. Its ROC AUC of 99.36% speaks to a very good capability in classifying stages of cancer. While feature importance is not available for this classifier, it can be said from the confusion matrix that the Naive Bayes model has done minimal misclassification in the classification task. In the end, the balanced performance metrics with high accuracy make the Naive Bayes classifier robust and reliable for this particular classification problem. The main target is to develop a proper ML model that can accurately classify or predicts the stage of a lung cancer depend on patient features or specifically status of the smoking, with a focus on improving diagnostic accuracy and providing insights for better treatment decisions. This project was limited in some aspects, given the difficulties in collecting and analysing accurate data. The limited amount of extensive research in the area and a good-quality dataset further constrained this analysis. Models implemented here need to be tested on a more accurate dataset and reviewed by specialists in the area of lung cancer to see how practically reliable they are. Besides, technical and programming difficulties limit improvements to the project. Based on this, further improvements can be made in the future, it is necessary to make more comprehensive data acquisition, improve the accuracy in conjunction with experts in the industry, and make the model more practical." en_US
dc.language.iso en en_US
dc.subject Machine Learning en_US
dc.subject Gradient Boosting en_US
dc.subject Prediction en_US
dc.subject Lung Cancer en_US
dc.subject Smoking en_US
dc.title A Comparative Evaluation of Machine Learning Models for Lung Cancer Stage Prediction in Smokers en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account