2021
DOI: 10.48550/arxiv.2102.10233
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods

Abstract: The variety of accents has posed a big challenge to speech recognition. The Accented English Speech Recognition Challenge (AESRC2020) is designed for providing a common testbed and promoting accent-related research. Two tracks are set in the challenge -English accent recognition (track 1) and accented English speech recognition (track 2). A set of 160 hours of accented English speech collected from 8 countries is released with labels as the training set. Another 20 hours of speech without labels is later relea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 35 publications
0
6
0
Order By: Relevance
“…Particularly, MTJR method achieves 75.2% average accent recognition accuracy while the STJR method only achieves 72.2% accuracy on the Test set, when the overall data is used for the pretraining. Such an accent recognition result places us in the second position out of more than 70 participants [7], and the best result [3] is obtained with 60 times of data augmentation as well as featurebased system fusion. For the STJR method as in [25,26], we note that we conducted 2 kinds of joint training recipes, that is, either output a accent label before actual ASR output or append an accent label after ASR final output.…”
Section: Resultsmentioning
confidence: 90%
See 1 more Smart Citation
“…Particularly, MTJR method achieves 75.2% average accent recognition accuracy while the STJR method only achieves 72.2% accuracy on the Test set, when the overall data is used for the pretraining. Such an accent recognition result places us in the second position out of more than 70 participants [7], and the best result [3] is obtained with 60 times of data augmentation as well as featurebased system fusion. For the STJR method as in [25,26], we note that we conducted 2 kinds of joint training recipes, that is, either output a accent label before actual ASR output or append an accent label after ASR final output.…”
Section: Resultsmentioning
confidence: 90%
“…In this paper, we attempt an E2E-based multi-task learning approach to conduct joint speech and accent recognition simultaneously. Our work is motivated by a workshop participation [7]. There are two tracks of the challenge, that is, 8 accented English speech recognition and corresponding accent classification respectively.…”
Section: Introductionmentioning
confidence: 99%
“…language processing [Kitashov et al, 2018, Shi et al, 2021. On the other hand, the larger SD for the human output may point to the fact that humans tend to have a high degree of variance in performances due to different background knowledge, skills, etc.…”
Section: Resultsmentioning
confidence: 99%
“…The accent embeddings are obtained from the penultimate layer output of the AID model. More details about the AID model can be found in our accent identification system description [12] for the AESRC 2020 challenge [2].…”
Section: Accent Identification and Embedding Extractionmentioning
confidence: 99%
“…One of the most pressing needs for ASR today is the support for multiple accents in a single system, which is often referred to as multi-accent speech recognition in the literature. The difficulties of recognizing accented speech, including phonology, vocabulary and grammar, have posed a serious challenge to current ASR systems [2]. A straightforward method is to build a single ASR model from mixed data (accented speech from non-native speakers and standard data from native speakers).…”
Section: Introductionmentioning
confidence: 99%