Computer-Assisted Language Learning (CALL) applications for improving the oral skills of low-proficient learners have to cope with non-native speech that is particularly challenging. Since unconstrained non-native ASR is still problematic, a possible solution is to elicit constrained responses from the learners. In this paper, we describe experiments aimed at selecting utterances from lists of responses. The first experiment on utterance selection indicates that the decoding process can be improved by optimizing the language model and the acoustic models, thus reducing the utterance error rate from 29-26% to 10-8%. Since giving feedback on incorrectly recognized utterances is confusing, we verify the correctness of the utterance before providing feedback. The results of the second experiment on utterance verification indicate that combining duration-related features with a likelihood ratio (LR) yield an equal error rate (EER) of 10.3%, which is significantly better than the EER for the other measures in isolation.
2EURASIP Journal on Audio, Speech, and Music Processing distinction between phonemes in the target language, thus producing one phoneme instead of two distinct ones. This is the case with many non-native speakers of English, for instance, Germans [6], who have difficulty in realizing the distinction between the English phonemes /ae/ and /e/ and often produce /e/ when /ae/ should be used, or Japanese speakers of English who have difficulty in distinguishing /l/ and /r/ [7] and may end up producing sounds that are neither an English /l/ nor an English /r/. In such cases, confusion may arise because distinct words will be realized in the same way. This can also happen when speech sounds are inappropriately deleted or inserted, which is another common phenomenon in non-native speech [8].With respect to morphology and syntax the speech of non-natives may also exhibit deviations from that of native speakers [9]. At the level of morphology, they may find it difficult to produce correct forms of verbs, nouns, adjectives, articles, and so forth, especially when the morphological distinction hinges on subtle phonetic distinctions, such as the presence of a plosive or fricative sound in consonant clusters or the distinction between two similar vowels (lead versus led). Irregular verbs and nouns may also pose serious problems, resulting in the production of nonexistent regularized forms. Deviations in syntax may concern the structure of sentences, the ordering of constituents and their omission or insertion. As to vocabulary, non-native speakers also tend to have a limited and often deviant lexicon. Finally, non-native speech exhibits more disfluencies and hesitation phenomena than native speech and is characterized by a lower speech rate [10][11][12][13][14].All these problems are compounded when dealing with speech of non-natives that are still in the process of learning the language. In general, the degree of deviation from native speech and the incidence of disfluencies will be in inverse relation to the degree...