Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-1908
|View full text |Cite
|
Sign up to set email alerts
|

Revisiting Parity of Human vs. Machine Conversational Speech Transcription

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(8 citation statements)
references
References 12 publications
2
6
0
Order By: Relevance
“…Speech scientists have long worked to supplement word error rate with more informative measures, including error analyses of overlap (Çetin and Shriberg, 2006), disfluencies (Goldwater et al, 2010), and conversational words (Zayats et al, 2019;Mansfield et al, 2021). This work has shown the importance of in-depth error analysis, and also brings home the multi-faceted challenges of truly interactive speech-to-text systems.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Speech scientists have long worked to supplement word error rate with more informative measures, including error analyses of overlap (Çetin and Shriberg, 2006), disfluencies (Goldwater et al, 2010), and conversational words (Zayats et al, 2019;Mansfield et al, 2021). This work has shown the importance of in-depth error analysis, and also brings home the multi-faceted challenges of truly interactive speech-to-text systems.…”
Section: Related Workmentioning
confidence: 99%
“…The most widely used metric for comparison is word error rate, whose main attraction -simplicity-is also its most important pitfall. Here we build on prior work calling for error analysis beyond WER (Mansfield et al, 2021;Zayats et al, 2019) and extend it by looking at multiple languages and considering aspects of timing, confidence, conversational words, and dialog acts.…”
Section: Introductionmentioning
confidence: 99%
“…A growing amount of work seeks to model feedback behaviour in human-agent interaction, including by means of response token generation [13,14] and attentive listening systems [15,16]. Despite considerable progress, the place of response tokens in speech technology is by no means settled: they tend to be missed by speech recognizers [17,18,19] and dialog managers have a hard time dealing with them [20], showing that they remain a key issue on which progress towards future generations of voice-interactive technologies and conversational user interfaces depends. Observational work on forms and functions of response tokens in human interaction is an important empirical foundation of any speech technology intended for human use.…”
Section: Related Workmentioning
confidence: 99%
“…Human transcribers disagree on this very difficult task [4,5,6,12,13], so it should not be surprising that there are inevitably some errors in these reference transcripts and mappings. For this work, a professional linguist was commissioned to very carefully audit and correct these references.…”
Section: Corrected Reference Filesmentioning
confidence: 99%
“…A careful analysis in [6] notes that "humans are more likely to miss words than to misrecognize them", and is notable in several regards: code was provided to specify a non-standard data cleaning and text normalization process, while output from a research system was re-scored in an (unsuccessful) attempt to replicate a published result. Our work continues in this effort to fully describe and improve upon the standard scoring methodology, sharing data and software to enable reproducible results.…”
Section: Introductionmentioning
confidence: 99%