Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-1171
|View full text |Cite
|
Sign up to set email alerts
|

Layer Pruning on Demand with Intermediate CTC

Abstract: Deploying an end-to-end automatic speech recognition (ASR) model on mobile/embedded devices is a challenging task, since the device computational power and energy consumption requirements are dynamically changed in practice. To overcome the issue, we present a training and pruning method for ASR based on the connectionist temporal classification (CTC) which allows reduction of model depth at run-time without any extra fine-tuning. To achieve the goal, we adopt two regularization methods, intermediate CTC and s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 28 publications
0
5
0
Order By: Relevance
“…Prior research for Transformer based speech processing models has largely evolved into two categories: 1) architecture compression methods that aim to minimize the Transformer model structural redundancy measured by their depth, width, sparsity, or their combinations using techniques such as pruning [8][9][10], low-rank matrix factorization [11,12] and distillation [13,14]; and 2) low-bit quantization approaches that use either uniform [15][16][17][18], or mixed precision [12,19] settings. A combination of both architecture compression and low-bit quantization approaches has also been studied to produce larger model compression ratios [12].…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Prior research for Transformer based speech processing models has largely evolved into two categories: 1) architecture compression methods that aim to minimize the Transformer model structural redundancy measured by their depth, width, sparsity, or their combinations using techniques such as pruning [8][9][10], low-rank matrix factorization [11,12] and distillation [13,14]; and 2) low-bit quantization approaches that use either uniform [15][16][17][18], or mixed precision [12,19] settings. A combination of both architecture compression and low-bit quantization approaches has also been studied to produce larger model compression ratios [12].…”
Section: Introductionmentioning
confidence: 99%
“…The commonly adopted approach requires each target compressed system with the desired size to be individually constructed, for example, in [14,15,17] for Conformer models, and similarly for SSL foundation models such as DistilHuBERT [23], FitHuBERT [24], DPHuBERT [31], PARP [20], and LightHuBERT [30] (no more than 3 systems of varying complexity were built). 2) limited scope of system complexity attributes covering only a small subset of architecture hyper-parameters based on either network depth or width alone [8,9,11,35,36], or both [10,13,14,37], while leaving out the task of low-bit quantization, or vice versa [15][16][17][18][19][32][33][34]. This is particularly the case with the recent HuBERT model distillation research [23][24][25][28][29][30][31] that are focused on architectural compression alone.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, it comes at the cost of higher computational resources and memory consumption. It is hypothesized that some of the many layers might be redundant and have little contribution to the overall system per-formance [26]. This motivates the present study to inspect the redundancy among layers and perform layer-level structured pruning, i.e., layer pruning (LP), for simplifying deep models.…”
Section: Introductionmentioning
confidence: 99%
“…Inspired by [26], if two layers' outputs are similar, the layers between them are assumed to be redundant and can be discarded. In the present study, we propose the Correlation Measure based Fast Search on Layer Pruning (CoMFLP).…”
Section: Introductionmentioning
confidence: 99%