Abstract:
Accurately predicting bug fix times within software projects remains a crucial yet elusive task. While existing research has explored bug prediction, the specific domain of predicting bug fix times on GitHub using machine learning techniques remains largely unexplored. This dearth of research extends to the lack of generalized models capable of accurately predicting fix times across diverse GitHub repositories. Existing studies often focus on specific projects or programming languages, limiting their broader applicability.
This research aims to forcast bug fix times on GitHub by employing a multifaceted approach. It utilizes machine learning techniques such as Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and Decision Trees to build individual prediction models. These models are combined in an ensemble approach to capitalize on their strengths and mitigate weaknesses, potentially leading to more accurate and generalizable predictions. Additionally, the research explores the impact of various data field.
As per our initial tests using KNN, SVM, and Random Forest models on the Ansible GitHub repository, the models demonstrated varied performance. KNN achieved the highest accuracy 61% but exhibited lower precision 47% and recall 61% suggesting potential overfitting. Conversely, SVM offered a balanced performance with 65% accuracy, 51% precision, and 65% recall, indicating good generalizability. Random Forest displayed consistency across metrics with 57% accuracy, highlighting its stability but potentially lower overall effectiveness. These preliminary results serve as a valuable starting point for further research.