Abstract:
"Essay questions are widely recognised as an effective means of assessing students'
comprehension of learning outcomes during examinations. However, the manual evaluation of
these essay questions poses challenges, requiring consideration of factors such as required
keywords, grammatical accuracy, and overall meaning of the responses. The advent of
digitalised examinations has led to the development of various automated grading techniques for
descriptive questions. While keyword matching and syntax analysis can be helpful tools,
determining the accuracy of a response based solely on meaning presents a significant challenge.
This research introduces GPT-DAE, a proposed system for evaluating descriptive questions
using a predefined answer script. Leveraging a GPT-based approach, the system identifies
semantic relationships within responses to assess the provided answers. During fine-tuning, the
author achieved training and validation losses of 0.0000 in one model and the other scored a
training loss of 0.2036 and a validation loss of 0.0352. Benchmarking against human-evaluated
question scores revealed an accuracy of over 66%, demonstrating the system's potential.
However, it is acknowledged that further fine-tuning is essential before deploying the prototype
in real-world applications. The study concludes that GPT-DAE has the capability to significantly
enhance descriptive answer grading with improved fine-tuning, paving the way for more
accurate and efficient evaluation in digitalized examination settings.
"