Abstract:
"The gap between physical and virtual classrooms have grown thin in the recent years due to the
advancement of technology, remote learning and convenience. Even having many advantages
over a physical classroom in many ways, the leading reason for teachers and students prefer a
real classroom is the teacher’s inability to monitor student focus level in a virtual classroom.
Almost all of the existing solutions tend to use video or image classification methods
varying from binary to multi-class approaches or regression. Presented here a novel approach for
student focus evaluation for virtual classrooms using Computer Vision involving a
Spatiotemporal Convolutional Autoencoder - Pose estimator hybrid model by considering the
evaluation problem as a video anomaly detection problem. The paper offers a Deep Learning
models can be implemented in a system student focus levels and present them to the teacher
allowing the teacher to be more aware of the student focus levels, dynamically shifting teaching
methods to keep more students engaged with the learning materials, increasing the effectiveness
of the class. Observed from the tests conducted for the EmotiW dataset, the system can achieve area
under the curve of the receiver operating characteristic curve value of ~0.8, area under the curve
of the precision–recall curve value of 0.53 and a False Alarm rate of 0.02 which is an
improvement on the existing systems, suggesting that considering the focus evaluation problem
as a video anomaly detection problem is a success and should be researched further. Although
there is room for improvement by using different video anomaly detection methods and
integrating active learning to the system."