Abstract:
"The research project addresses the complex field of AI for legal assistance by developing and
implementing a state-of-the-art chatbot capable of handling legal inquiries within the context of
Sri Lankan jurisprudence. The challenge lies in building a domain-specific dataset creation
pipeline and training a customized chatbot to provide accurate and comprehensive legal advice.
Built on top of advanced pre-trained language models, the project establishes a domain-specific
dataset creation pipeline. The dataset is then utilized to train a customized chatbot using the Llama-
2-7b-chat-hf model. Leveraging technologies such as PyMuPDF, Transformers, and SpaCy, the
project demonstrates an efficient method for legal knowledge retrieval. The incorporation of the
Discord API further enhances the chatbot's functionality, allowing seamless integration with the
Discord platform and providing users with an interactive and accessible platform to seek legal
advice.
The model demonstrated a consistent improvement in performance over successive epochs, with
an approximate accuracy of 0.9537 after 692 steps this was achieved throughout the training
process which training occurred multiple time with an average of 0.9 accuracy based on epoch.
This progression was captured in a plot labelled “train/epoch,” which showed a linear increase in
a metric, presumably accuracy, over time. As a result, the developed chatbot is capable and
dedicated to solving a broad range of legal inquiries in the context of Sri Lankan Law, paving the
way for more advanced and user-friendly legal assistance solutions in the future."