| dc.description.abstract |
Large Language Models have restructured natural language processing by the ability to produce fluent, human-like answers. However, their tendency to cause hallucinations- believable, but factually inaccurate material is a serious danger in the academic sphere, where precision and stability are essential. Moreover, rule-based or intent-based traditional educational chatbots lack the necessary flexibility and fail to support complex and context-related student requests. This deficit creates a wide gap in reliable, vibrant academic support for university learners.
This project suggests EdRAG, a chatbot based on Retrieval-Augmented Generation that constructs a hybrid retrieval system comprising two channels that synthesise structured triples of knowledge with unstructured lecture information. Dense vector search and sparse BM25 retrieval runs in parallel to maximise semantic and lexical coverage. The material gathered is then passed on to a memory-aware prompt construction module, which encodes recent conversational context to keep things coherent. In order to achieve academic relevance, two domain-specific datasets were developed. EdCyberQ, an open-ended question-answering dataset, and EdRAG knowledge triples.
The evaluation measures are BERTScore (Precision: 0.8920, Recall: 0.9168, F1: 0.9042) and RAGAS metrics (Faithfulness: 0.8576, Answer Relevancy: 0.8486, Context Precision: 0.8147, Context Recall: 0.8641). Comparisons with GPT-4o in the same retrieval pipeline show that LLaMA 3.3 70B is better at both factual grounding and answer relevance than GPT-4o, despite it being an open-source model. These findings indicate that EdRAG is an effective way to reduce hallucinations and provide precise academic assistance that has a domain-generalizable solution to AI-enhanced learning environments. |
en_US |