2019
DOI: 10.48550/arxiv.1910.12094
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Meta Learning for End-to-End Low-Resource Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 0 publications
0
4
0
Order By: Relevance
“…Following the previous works [22,23], we used a CNN-BiLSTM-Head structure as the multilingual ASR model, as shown in Figure 2(a), and adopted Connectionist Temporal Classification (CTC) [26] loss as the objective function. The baseline model architecture followed the previous work [23], where the CNN module was a 6-layer VGG block as shown in Figure 2(b), and the BiLSTM module was a 3-layer bidirectional LSTM network with 360 cells in each direction. We experimented with the channel number of convolutions in VGG as 128 or 512, and the results of these two settings in the following subsection were named as VGG-Small and VGG-Large.…”
Section: Implementation Detailsmentioning
confidence: 99%
See 2 more Smart Citations
“…Following the previous works [22,23], we used a CNN-BiLSTM-Head structure as the multilingual ASR model, as shown in Figure 2(a), and adopted Connectionist Temporal Classification (CTC) [26] loss as the objective function. The baseline model architecture followed the previous work [23], where the CNN module was a 6-layer VGG block as shown in Figure 2(b), and the BiLSTM module was a 3-layer bidirectional LSTM network with 360 cells in each direction. We experimented with the channel number of convolutions in VGG as 128 or 512, and the results of these two settings in the following subsection were named as VGG-Small and VGG-Large.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…Following the previous works [17,19,22,23], we conducted experiments on the multilingual dataset, IARPA BA-BEL [24]. The experiment results show that our approach out-arXiv:2005.07029v1 [eess.AS] 13 May 2020 performed the baseline fixed-topology architecture by 10.2% and 10.0% relative reduction on character error rates (CER) under monolingual and multilingual ASR settings respectively.…”
Section: Introductionmentioning
confidence: 97%
See 1 more Smart Citation
“…It is known that multilingual training or pre-training with re- lated languages improves low-resource end-to-end ASR significantly [16,17,18,19]. Meta learning methods [38] have recently been introduced to improve the efficiency of multilingual pre-training. Besides cross-lingual transfer learning, leveraging auxiliary data is another approach to improve lowresource ASR, for example, incorporating (synthetic) text translation data as additional inputs [21,20,22] or co-training with weakly supervised data [39] or text-to-speech (TTS) data [40].…”
Section: Related Workmentioning
confidence: 99%