2019
DOI: 10.48550/arxiv.1910.10261
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
20
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(20 citation statements)
references
References 12 publications
0
20
0
Order By: Relevance
“…Table 2 compares the (WER) result of our model on Lib-riSpeech test-clean/test-other with a few state-of-the-art models include: ContextNet [10], Transformer transducer [7], and QuartzNet [9]. All our evaluation results round up to 1 digit after decimal point.…”
Section: Results On Librispeechmentioning
confidence: 99%
See 1 more Smart Citation
“…Table 2 compares the (WER) result of our model on Lib-riSpeech test-clean/test-other with a few state-of-the-art models include: ContextNet [10], Transformer transducer [7], and QuartzNet [9]. All our evaluation results round up to 1 digit after decimal point.…”
Section: Results On Librispeechmentioning
confidence: 99%
“…Recently, the Transformer architecture based on self-attention [6,7] has enjoyed widespread adoption for modeling sequences due to its ability to capture long distance interactions and the high training efficiency. Alternatively, convolutions have also been successful for ASR [8,9,10,11,12], which capture local context progressively via a local receptive field layer by layer.…”
Section: Introductionmentioning
confidence: 99%
“…The TDS architecture we use here also results in much lighter weight yet still low WER models. Other architectures, such as the Time-Channel Separable convolution have also shown similar gains in computational efficiency with little if any hit to accuracy [21].…”
Section: Related Workmentioning
confidence: 99%
“…In the case of the Oracle Teacher, we trained the model for 30 epochs on a single 12GB GPU, which took about 22 hours (≈ 1 day) to finish the training. Considering that the reported training of the Quartznet [67], which is more computationally efficient than Jasper DR, for 400 epochs took 122 hours (≈ 5 days) with eight 32GB GPUs with a batch size of 256, the Oracle Teacher dramatically reduced the computational cost of the teacher model.…”
Section: B Computational Cost Comparisonmentioning
confidence: 99%