| dc.description.abstract |
Communication barriers for the hearing-impaired community in Sri Lanka persist due to the
lack of accessible, real-time Sri Lankan Sign Language (SSL) translation systems that capture
emotional expressiveness. Existing systems mainly recognize hand gestures but overlook facial
expressions, resulting in translations that miss emotional nuance and naturalness, limiting their
effectiveness in daily and critical communication.
This project presents a multimodal, emotion-aware SSL translation system combining
Timesformer-based gesture recognition with MediaPipe facial and body landmark fusion. A
facial emotion detection module using DeepFace extracts dominant emotions, which are
converted into expressive speech via a Typecast API-powered emotional TTS. The backend is
built as FastAPI microservices deployed on cloud platforms, integrated with a Flutter mobile
app interface. The system employs deep learning, multimodal fusion, and user-centered design,
validated through extensive training and mixed-method evaluations.
The prototype achieved around 85% gesture recognition accuracy on SSL datasets, over 80%
accuracy in emotion detection, and generated natural, context-aware speech with latency
suitable for near real-time use. Balanced metrics like F1 score and precision-recall, alongside
expert and user feedback, demonstrate the system’s robustness and improved communication
clarity in both every day and emergency scenarios. |
en_US |