Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-2140
|View full text |Cite
|
Sign up to set email alerts
|

“You don’t understand me!”: Comparing ASR Results for L1 and L2 Speakers of Swedish

Abstract: The performance of Automatic Speech Recognition (ASR) systems has constantly increased in state-of-the-art development. However, performance tends to decrease considerably in more challenging conditions (e.g., background noise, multiple speaker social conversations) and with more atypical speakers (e.g., children, non-native speakers or people with speech disorders), which signifies that general improvements do not necessarily transfer to applications that rely on ASR, e.g., educational software for younger st… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

3
6

Authors

Journals

citations
Cited by 13 publications
(7 citation statements)
references
References 15 publications
0
7
0
Order By: Relevance
“…Furthermore, as the interaction between the robot and learners depends, for the most part, on the speech technology used in the dialogue system, we evaluated available Automatic Speech Recognition systems (ASR) to compare their accuracy with L1 and L2 speakers of Swedish [19]. Our findings corroborated that ASR performance for non-native speakers is worse than for L1 speakers, increasing up to a double of the Word Error Rate (WER) of native speakers in social conversations.…”
Section: Past Present and Future Workmentioning
confidence: 66%
“…Furthermore, as the interaction between the robot and learners depends, for the most part, on the speech technology used in the dialogue system, we evaluated available Automatic Speech Recognition systems (ASR) to compare their accuracy with L1 and L2 speakers of Swedish [19]. Our findings corroborated that ASR performance for non-native speakers is worse than for L1 speakers, increasing up to a double of the Word Error Rate (WER) of native speakers in social conversations.…”
Section: Past Present and Future Workmentioning
confidence: 66%
“…Yet it is precisely such items that are woefully underrepresented in the data underlying most current language models (Prevot et al, 2019). It is little surprise that conversational agents have a hard time dealing with informal conversational style (Hoegen et al, 2019) and building social bonds (Cassell, 2020), and that speech recognition easily mixes up interjections with opposite pragmatic functions (Zayats et al, 2019) if it doesn't miss them altogether (Cumbal et al, 2021).…”
Section: The Natural Habitat Of Languagementioning
confidence: 99%
“…And indeed there are indications that ASR performs less well for such data. One study comparing Google, Microsoft and HuggingFace ASR models for Swedish found that "for all spontaneous speech, the ASRs frequently fail to produce a transcription for short utterances" (Cumbal et al, 2021). Losing or incorrectly transcribing short utterances may not be a big problem for speech recognition models whose main function is to deal with relatively clean recordings of non-conversational speech (such as speeches, radio programs, parliamentary meetings and other highly institutionalized text types).…”
Section: Conversational Vs Asr Corporamentioning
confidence: 99%