SoftesT: A Novel Approach to Software Effort Estimation by Integrating Ensemble Models and Explainable AI

Sujana, Tharmalingam

dc.contributor.author	Sujana, Tharmalingam
dc.date.accessioned	2026-04-07T05:38:03Z
dc.date.available	2026-04-07T05:38:03Z
dc.date.issued	2025
dc.identifier.citation	Sujana, Tharmalingam (2025) SoftesT: A Novel Approach to Software Effort Estimation by Integrating Ensemble Models and Explainable AI. BSc. Dissertation, Informatics Institute of Technology	en_US
dc.identifier.issn	20210070
dc.identifier.uri	http://dlib.iit.ac.lk/xmlui/handle/123456789/3122
dc.description.abstract	Problem: Software effort estimation is a crucial task that makes a project successful by completing it in estimated time and budget. Traditional method often struggles because of the agile project nature and lack accuracy making it difficult for the project managers to estimate the accurate time needed to complete the project as a result project can get delayed and exceed budget. This project aims to tackle these challenges by building a more accurate and interpretable model that combines advanced machine learning techniques with explainability driven feature selection method. Methodology: The proposed solution involves a novel approach using china dataset. Initially, data preprocessing is performed to handle skewness, outliers and scale features. We then used SHAP (Explainable AI) values, derived from initial models, to select the most impactful features. Our core model is a heterogeneous ensemble, combining Linear Regression, Decision Tree, Random Forest, XGBoost, and CatBoost.Also experimentally developed and integrated a custom neural network SHAP-Guided attention mechanism. Optuna is used to fine-tune the hyperparameters for the neural network component. Results: Testing on the China dataset resulted in excellent performance for the final ensemble model (combining the base learners trained on SHAP-selected features), achieving an R² of approximately 0.99 and a Mean Magnitude of Relative Error (MMRE) around 1.14%. This indicates high accuracy and reliability on the test data. However, the evaluation also revealed that the experimental SHAP-guided attention neural network component performed poorly (negative R²), likely due to data limitations and complexity challenges, and it was therefore excluded from the final successful ensemble prediction.	en_US
dc.language.iso	en	en_US
dc.subject	Shap	en_US
dc.subject	Attention	en_US
dc.subject	Mechanism	en_US
dc.title	SoftesT: A Novel Approach to Software Effort Estimation by Integrating Ensemble Models and Explainable AI	en_US
dc.type	Thesis	en_US