Digital Repository

SchemaRAG: Semantic dictionary based Agentic LLM for Text-to-SQL

Show simple item record

dc.contributor.author Nugara, Binura
dc.date.accessioned 2026-03-10T07:51:35Z
dc.date.available 2026-03-10T07:51:35Z
dc.date.issued 2025
dc.identifier.citation Nugara, Binura (2025) SchemaRAG: Semantic dictionary based Agentic LLM for Text-to-SQL. Msc. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 20211182
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/2896
dc.description.abstract Text-to-SQL is an efficient way to generate SQL queries and retrieve data from databases using human natural language. This approach helps non-technical users in different domains such as education, healthcare, and finance to access information from databases for different use cases such as decision making, reporting and analytics without needing technical expertise to write SQL queries. With the rapid development of Large Language models over recent years, an increase in Text-to-SQL studies can be observed and the performance of these models has greatly improved due to advanced reasoning capabilities of LLMs. However, due to the inherent ambiguity of the natural language and the complexity of the schema and required SQL queries, the existing Text-to-SQL solutions suffer from issues such as hallucinations, lack of domain knowledge and the inability to accurately generate complex queries with join operations. To address these limitations, an Agentic LLM system for Text-to-SQL tasks was implemented during this study with the use of Retrieval-Augmented Generation (RAG) for domain knowledge enhancement of the LLM. Four LLM agents were utilized in the Text-to-SQL workflow. The process starts with an Assistant agent capturing user’s natural language questions and its role is to delegate schema retrieval, query generation and query validation tasks to the Retrieval Agent, Query Generation Agent and Validation Agent respectively, in the same order. The schema knowledge is stored in a database as vector embeddings, which are used to fetch relevant schema information related to the user’s natural language question based on the semantic similarities. SchemaRAG achieved an execution accuracy of (EX) 73.9% during the testing phase. The proposed system performs reasonably well considering the minimal effort needed for SQL query generation compared to existing work, which is evident through evaluation feedback. The proposed system shows great potential in real world use cases with its usability improvements such as minimal reliance on user interaction and enabling access to non-technical users through automatic schema knowledge retrieval en_US
dc.language.iso en en_US
dc.subject Large Language Models en_US
dc.subject Retrieval Augmented Generation en_US
dc.subject Natural Language Processing en_US
dc.title SchemaRAG: Semantic dictionary based Agentic LLM for Text-to-SQL en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account