Abstract:
Assistive software such as screen readers are unable to describe images or videos for visually impaired people. Although recent research have found ways to describe an image automatically, describing the content of a video is still an ongoing issue. Visually impaired people find it difficult to understand video content without an indication of sound. The current solution of video description is only provided through digital television and for selected programs and movies. As an initiative to describe video content for visually impaired people, the solution acts as a video player which automatically understands the ongoing human action on screen, associates textual descriptions and narrates it to the blind user. The human actions in the video should be recognized in real time, hence fast, reliable feature extraction and classification methods must be adopted. A feature set is extracted for each frame and is obtained from the projection histograms of the foreground mask. The number of moving pixels for each row and column of the frame is used to identify the instant position of a person. Support Vector Machine (SVM) is used to classify extracted features of each frame. The final classification is given by analyzing frames in segments. The classified actions will be converted from text to speech.