Enhancing Academic Assistance with a Novel Retrieval-Augmented Generation Chatbot: A Solution to Hallucination in Large Language Models

Nanayakkara, Shadheera

dc.contributor.author	Nanayakkara, Shadheera
dc.date.accessioned	2026-03-24T06:46:48Z
dc.date.available	2026-03-24T06:46:48Z
dc.date.issued	2025
dc.identifier.citation	Nanayakkara , Shadheera (2025) Enhancing Academic Assistance with a Novel Retrieval-Augmented Generation Chatbot: A Solution to Hallucination in Large Language Models. BSc. Dissertation, Informatics Institute of Technology	en_US
dc.identifier.issn	20200147
dc.identifier.uri	http://dlib.iit.ac.lk/xmlui/handle/123456789/3046
dc.description.abstract	Large Language Models have restructured natural language processing by the ability to produce fluent, human-like answers. However, their tendency to cause hallucinations- believable, but factually inaccurate material is a serious danger in the academic sphere, where precision and stability are essential. Moreover, rule-based or intent-based traditional educational chatbots lack the necessary flexibility and fail to support complex and context-related student requests. This deficit creates a wide gap in reliable, vibrant academic support for university learners. This project suggests EdRAG, a chatbot based on Retrieval-Augmented Generation that constructs a hybrid retrieval system comprising two channels that synthesise structured triples of knowledge with unstructured lecture information. Dense vector search and sparse BM25 retrieval runs in parallel to maximise semantic and lexical coverage. The material gathered is then passed on to a memory-aware prompt construction module, which encodes recent conversational context to keep things coherent. In order to achieve academic relevance, two domain-specific datasets were developed. EdCyberQ, an open-ended question-answering dataset, and EdRAG knowledge triples. The evaluation measures are BERTScore (Precision: 0.8920, Recall: 0.9168, F1: 0.9042) and RAGAS metrics (Faithfulness: 0.8576, Answer Relevancy: 0.8486, Context Precision: 0.8147, Context Recall: 0.8641). Comparisons with GPT-4o in the same retrieval pipeline show that LLaMA 3.3 70B is better at both factual grounding and answer relevance than GPT-4o, despite it being an open-source model. These findings indicate that EdRAG is an effective way to reduce hallucinations and provide precise academic assistance that has a domain-generalizable solution to AI-enhanced learning environments.	en_US
dc.language.iso	en	en_US
dc.subject	Retrieval Augmented Generation	en_US
dc.subject	Search Mechanisms	en_US
dc.subject	Language Models	en_US
dc.title	Enhancing Academic Assistance with a Novel Retrieval-Augmented Generation Chatbot: A Solution to Hallucination in Large Language Models	en_US
dc.type	Thesis	en_US