Speech, Voice, Text, and Meaning

Hessen, Arjan van; Calamai, Silvia; Heuvel, H. van den; Scagliola, Stefania; Karrouche, Norah; Beeken, Jeannine; Corti, Louise; Draxler, Christoph

doi:10.1145/3382507.3420054

Cited by 2 publications

(1 citation statement)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Even in recent times, a comparatively high word error rate (WER) on the oral history domain, compared to other ASR tasks, characterizes most works. Hessen et al (2013) describe the use of ASR to transcribe Dutch oral history archives. The authors state the WER is above 40 % for Dutch oral history interviews at the time of publishing.…”

Section: Related Workmentioning

confidence: 99%

Human and Automatic Speech Recognition Performance on German Oral History Interviews

Gref¹,

Matthiesen²,

Schmidt³

et al. 2022

Preprint

View full text Add to dashboard Cite

Automatic speech recognition systems have accomplished remarkable improvements in transcription accuracy in recent years. On some domains, models now achieve near-human performance. However, transcription performance on oral history has not yet reached human accuracy. In the present work, we investigate how large this gap between human and machine transcription still is. For this purpose, we analyze and compare transcriptions of three humans on a new oral history data set. We estimate a human word error rate of 8.7 % for recent German oral history interviews with clean acoustic conditions. For comparison with recent machine transcription accuracy, we present experiments on the adaptation of an acoustic model achieving near-human performance on broadcast speech. We investigate the influence of different adaptation data on robustness and generalization for clean and noisy oral history interviews. We optimize our acoustic models by 5 to 8 % relative for this task and achieve 23.9 % WER on noisy and 15.6 % word error rate on clean oral history interviews.

show abstract

Section: Related Workmentioning

confidence: 99%