Abstract:
"Stuttering is a common speech disorder which involves disruptions or disfluencies in a person’s speech at all ages. Millions of people experience stuttering, which warrants consideration for a number of reasons such as generalized anxiety, social anxiety, speech phobia and many more.
These described signs, issues, and emotional states can also appear in online interactions like sending or sharing audio files, creating podcasts, and engaging in many other activities.
The identification of stuttering has been attempted in a variety of techniques, but more frequently in recent years, when compared to other approaches, the deep learning approach has been substantially used and has produced positive results.
The author proposes a system able to detect stuttering in audio files which consists of CNNs architecture, MFCC’s feature extraction which is passed as the input for the model and Keras Classifiers to get the best hyperparameter using GridSearchCV and delivers promising results compared to the state of art systems. The model predicts and provides a binary output, if either the input file has stuttering or doesn’t have stuttering.
The dataset used is the SEP-28k dataset which is provided by University of California, and It has a collection of 28122 short audio clips with a duration of 3 seconds each.
This work is evaluated against the other state of art approaches taken during these years to detect stuttering."