OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. This is an author-deposited version published in : http://oatao.univ-toulouse.fr/ Eprints ID : 13045To link to this article : DOI :10.1007/978-3-319-09761-9_1 URL : http://dx.doi.org/10.1007/978-3-319-09761-9_1To cite this version : Hämäläinen, Annika and Cho, Hyongsil and Candeias, Sara and Pellegrini, Thomas and Abad, Alberto and Tjalve, Michael and Trancoso, Isabel and Dias, Michael Automatically Recognising European Portuguese Children's Speech. (2014) In: International Conference on Computational Processing of Portuguese -PROPOR 2014, 6 October 2014 -9 October 2014 Abstract. This paper reports findings from an analysis of errors made by an automatic speech recogniser trained and tested with 3-10-year-old European Portuguese children's speech. We expected and were able to identify frequent pronunciation error patterns in the children's speech. Furthermore, we were able to correlate some of these pronunciation error patterns and automatic speech recognition errors. The findings reported in this paper are of phonetic interest but will also be useful for improving the performance of automatic speech recognisers aimed at children representing the target population of the study.Keywords: Automatic speech recognition, children's speech, error analysis, European Portuguese, fricatives, pronunciation, vowel formants.
IntroductionSpeech interfaces have tremendous potential in the education of children. Speech provides a natural modality for child-computer interaction and can, at its best, contribute to a fun, motivating and engaging way of learning [1]. However, it is well known that automatically recognising children's speech is a very challenging task. Recognisers trained on adult speech tend to perform substantially worse when used by children [1][2][3][4][5][6]. Moreover, word error rates (WERs) on children's speech are usually much higher than those on adult speech, even when using a recogniser trained on children's speech, and they show a gradual decrease as the children get older [1][2][3][4][5][6][7]. The difficulty of automatically recognising children's speech can be attributed to it being acoustically and linguistically very different from adult speech [1,2]. For instance, due to their smaller vocal tracts, the fundamental and formant frequencies of children's speech are higher [1,2,[7][8][9]. What is particularly characteristic of children's speech is its higher variability as compared with adult speech, both within and across speakers [1,2]. This variability is caused by rapid developmental changes in their anatomy, speech production etc., and manifests itself, for example, in speech rate, in the degree of spontaneity, in the frequency of disfluencies, in the values of fundamental and formant frequencies, as well as in pronunciation quality [1,2,[7][8][9][10][11]. The highly variable values of acoustic parameters converge to adult levels at around 13-15 years of ...