In this paper, the Deep Long-short term memory Autoencoder (DLAE), a regularized deep learning model, is proposed for the automatic severity assessment of phonological deviations which are crucial stuttering markers in children. This automatic noninvasive severity assessment plays a paramount role in prevenient diagnosis, progress inference, and post-care for the patients with specific speech disorder. The proposed model is an implementation of a multi-layered Autoencoder in the Encoder–Decoder architecture of the Long-Short Term Memory (LSTM) model with hierarchically appended hidden layers and hidden units. The DLAE has definite advantage over the baseline Autoencoders. During the training phase, the proposed DLAE reconstructs the phonological features in an unsupervised fashion and the latent bottleneck features are extracted from the Encoder. The trained and regularized DLAE model with drop out is then used to predict the severity of the phonological deviation with high precision and classification accuracy compared to the baseline models.
In this paper, Weight Decorrelated Stacked Autoencoder-Deep Neural Decision Trees (WDSAE-DNDT), a novel hybrid model is proposed for automating the assessment of children’s speech fluency disorders by discerning their disfluencies. In fluency disorder classification, it is imperative to know how each feature contributes to the disorder classification rather than the diagnosis itself and so the depth modified DNDT acts as the best discriminator since it is interpretable by its very nature. The WDSAE presents DNDT with a high-level latent representation of the disfluent speech. A fusion feature vector was built by combining the prosodic cues from disfluent speech segments combined with the WDSAE-based Bottleneck features. The proposed hybrid model was compared with the performance of the experimented baseline models. Further analysis was carried out to check the impact of tree cut points for each feature and epochs on the accuracy of prediction of the hybrid model. The proposed hybrid model when trained on the fusion feature set has shown appreciable improvement in the area under the Receiver Operating Characteristics (ROC) curve, classification accuracy, Kappa statistical value, and Jaccard similarity index. The WDSAE-DNDT demonstrates high precision than the baseline models in setting clinical benchmark to distinguish subjects with dysphemia from those with Specific Language Impairment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.