Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.345
|View full text |Cite
|
Sign up to set email alerts
|

How Accents Confound: Probing for Accent Information in End-to-End Speech Recognition Systems

Abstract: In this work, we present a detailed analysis of how accent information is reflected in the internal representation of speech in an end-toend automatic speech recognition (ASR) system. We use a state-of-the-art end-to-end ASR system, comprising convolutional and recurrent layers, that is trained on a large amount of US-accented English speech and evaluate the model on speech samples from seven different English accents. We examine the effects of accent on the internal representation using three main probing tec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 17 publications
(10 citation statements)
references
References 24 publications
0
10
0
Order By: Relevance
“…Prasad and Jyothi [11] presented a variety of in-depth approaches to analyzing DeepSpeech2 neural layers for accented speech. The authors show that the strongest gradients of the final neural ASR layer for a given word align with the timing interval for that word.…”
Section: Multi-task Learningmentioning
confidence: 99%
“…Prasad and Jyothi [11] presented a variety of in-depth approaches to analyzing DeepSpeech2 neural layers for accented speech. The authors show that the strongest gradients of the final neural ASR layer for a given word align with the timing interval for that word.…”
Section: Multi-task Learningmentioning
confidence: 99%
“…Of course, speech as an input modality, can be advantageous for many disabled people such as blind individuals [4,47,136] and those with upper limb motor impairments [50,76]. However, it is still limited for dysarthric [24,76], deaf [35], and accented [99] speech as well as for low resource languages and noisy environments, all being active areas of research. With advances in speech recognition, we believe that we could leverage for many older adults their verbal reports as a reliable data source in an automated manner.…”
Section: Leveraging Verbal Reports As Anmentioning
confidence: 99%
“…One prominent line of research focuses on understanding which parts of the neural networks process and encode accent information. For example, research exploring the weights of the hidden layers of an end-to-end system Deep-Speech2 for accented speech shows that the first RNN layer contains the most information about accents [12]. It suggests that this part of an end-to-end model can be adapted to learn abstract representations that are less accent specific.…”
Section: Previous Workmentioning
confidence: 99%