Deep Learning based Optical Character Recognition System  for Sinhala Handwritten Text with  Tri-Level Segmentation including Overlapped and Touched Character Segmentation

Dias, Nathindu

Deep Learning based Optical Character Recognition System for Sinhala Handwritten Text with Tri-Level Segmentation including Overlapped and Touched Character Segmentation

Dias, Nathindu

URI: http://dlib.iit.ac.lk/xmlui/handle/123456789/1837

Date: 2023

Abstract:

"Computer vision has been expanded to distinguish handwritten and printed characters to enhance interaction between humans and computers. For Asian languages, however, this is still a subject of much debate. Since Sri Lanka is the only country in Asia where Sinhala is the official language, identifying Sinhala language characters still needs to be answered. The majority of character recognition past research works use pattern matching and image processing approaches. However, these methods could be more capable of adapting to variances. Also, most past research works did not address the gaps in text segmentation in the Sinhala language. Segmentation can consider the key factor in text recognition because a single error in the segmentation process will lower the accuracy of the entire recognition process. By fulfilling the current gaps in Sinhala text recognition, Akshara – Sinhala HCR System is capable of recognizing Sinhala handwritten text data from image input accurately and efficiently. This system uses pixel-based algorithms to segment the text input by line, word, and character(tri-level) wise, including overlapped characters. The author proposed a novel algorithm for the touched character segmentation in this research. After the segmenting paragraph into character-level, utilizing a Convolutional Neural Network (CNN) based architecture, the system recognizes the image character as a digital text. This system performs text segmentation by archiving an average of 91.8% and character recognition with a 93.37% accuracy rate with 65 supported character classes. Segmenting the touched characters using a novel approach and architecture of the CNN is the novelty of this research which covers a gap that needs to be down-scale of Sinhala handwritten recognition."

Show full item record