Abstract:
"Onboarding new members into legacy or large-scale software applications is often a very
challenging process. New members need to adapt to the application in a short span of time. One
of the major problems they face during the onboarding process is having difficulties to find project
related information. Most of the time these newcomers depend on people who are currently
working on these projects. This will impact the productivity of both parties.
To tackle this problem, this research proposes a Machine Learning (ML) driven approach
leveraging Natural Language Processing (NLP) and knowledge modeling techniques. By
aggregating project information from various sources such as code repositories, pull requests, and
communication channels, the methodology aims to provide context-aware guidance to new
developers. Through the integration of Bidirectional Encoder Representations from Transformers
(BERT) model, the approach seeks to optimize the onboarding process by improving their
adaptation time and the overall productivity of development teams.
Preliminary results of the prototype implementation demonstrate promising outcomes. Leveraging
a dataset focused on information within agile open-source software projects, the proposed
approach showcases effectiveness in providing relevant information to new developers. Initial
quantitative assessments reveal an improvement in onboarding efficiency, with a reduction in the
time required for new members to access critical project information. "