The present study aims at predicting the speech fluency of children using automatic acoustic measures derived from forward-backward divergence segmentation (FBDS). Thirteen Korean children were recorded while reading out loud a set of sentences. Three native-Korean speakers evaluated the fluency of each sentence on a five-point scale. A FBDS algorithm was used to segment speech recordings into sub-phonemic units and silent segments. In addition to the low-level acoustic features directly derived from FBDS segments, higher-level acoustic features were computed by clustering FBDS segments into pseudo-syllables and silent breaks. Both low-and higher-level features were used to predict average ratings of speech fluency, using a leave-one-speaker-out cross-validation scheme and three regression models: a multiple linear regression, a support vector regression, and a random-forest regressor. Highly accurate predictions were achieved, with average root-mean-square errors (RMSEs) as low as 0.3. Prediction accuracy did not significantly change as a function of regression model. Using higher-level features yielded lower RMSEs than using raw FBDS features. The results of a multiple linear regression using higher-level features (R 2 = 0.94) suggest that speech/silence ratio and pseudo-syllable rate are the two most important predictors of speech fluency.
This research work investigates the possibility of using automatic acoustic measures to assess speech fluency in the context of second language (L2) acquisition. To this end, three experts rated speech recordings of Japanese learners of French who were instructed to read aloud a 21-sentence-long text. A Forward-Backward Divergence Segmentation (FBDS) algorithm was used to segment speech recordings (sentences) into acoustically homogeneous units at a subphonemic scale. The FBDS processing results were used-along with more classic measures such as raw percentage of speech and length/standard deviation of silent pauses-to estimate speech rate and regularity of speech rate, while a formant tracking algorithm was used to estimate speech fluidity (i.e., quality of coarticulation). A step-by-step multiple linear regression was finally computed to predict the experts' mean fluency ratings. Results show that FBDS-derived measures, raw percentage of speech, and standard deviation of the first formant curve derivative can be combined together to calculate accurate estimates of speakers' fluency scores (R = .92; P < .001). As only low-level signal features were used in the study, the method could also be relevant for the assessment of speakers of other target languages, as well as for the assessment of disordered speech.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.