Intermediate Loss Regularization for CTC-Based Speech Recognition

Lee, Jaesong; Watanabe, Shinji

doi:10.1109/icassp39728.2021.9414594

Cited by 85 publications

(44 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Intermediate CTC [22] is an auxiliary loss designed for CTC modeling. It regularizes the model using an additional CTC loss attached at the intermediate layer of the encoder.…”

Section: Intermediate Ctcmentioning

confidence: 99%

“…The only additional cost is to compute CTC loss for the given representation, which is much smaller than the cost of the encoder. [22] explores various choices for the intermediate layer positions, and concludes that it is sufficient to use one layer (K = 1) in the middle (l1 = L/2 ) for the regularization purpose. We revisit the effect of variants at Section 3.…”

Section: Intermediate Ctcmentioning

confidence: 99%

“…The position variant of intermediate CTC is first explored by [22], concluding that its impact to the accuracy is minimal. However, we find the variant gives big difference on layer similarity and layer pruning.…”

Section: Effect Of Intermediate Ctc Variantsmentioning

confidence: 99%

“…Also, for high-end devices equipped with dedicated processors, CTC with modern architectures (e.g., Transformer [18,19] and Conformer [20,21]) allows fast parallel inference, coming from the non-autoregressive property of CTC. Also, while CTC has been regarded weaker than encoder-decoder, various ways to improve CTC, including pretraining [12] and regularization [22,23], have been developed. Also, there are active researches on CTC variants [12] and on non-autoregressive modeling based on CTC [24,25,26], suggesting that effective pruning for CTC can also be applied to other variants or non-autoregressive models.…”

Section: Introductionmentioning

confidence: 99%

“…In this work, we consider the on-demand layer pruning problem: training a deep neural network and removing some of the layers, thus reducing the depth of the network, without any fine-tuning. To achieve this, we employ two methods, intermediate CTC [22] and stochastic depth [27,28,5]. Although they are originally introduced as regularizers during training, we show they can be applied to layer pruning method as well.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Layer Pruning on Demand with Intermediate CTC

2021

Self Cite

View full text Add to dashboard Cite

Deploying an end-to-end automatic speech recognition (ASR) model on mobile/embedded devices is a challenging task, since the device computational power and energy consumption requirements are dynamically changed in practice. To overcome the issue, we present a training and pruning method for ASR based on the connectionist temporal classification (CTC) which allows reduction of model depth at run-time without any extra fine-tuning. To achieve the goal, we adopt two regularization methods, intermediate CTC and stochastic depth, to train a model whose performance does not degrade much after pruning. We present an in-depth analysis of layer behaviors using singular vector canonical correlation analysis (SVCCA), and efficient strategies for finding layers which are safe to prune. Using the proposed method, we show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU, while each pruned sub-model maintains the accuracy of individually trained model of the same depth.

show abstract

“…Intermediate CTC [22] is an auxiliary loss designed for CTC modeling. It regularizes the model using an additional CTC loss attached at the intermediate layer of the encoder.…”

Section: Intermediate Ctcmentioning

confidence: 99%

Section: Intermediate Ctcmentioning

confidence: 99%

Section: Effect Of Intermediate Ctc Variantsmentioning

confidence: 99%