“…Also, for high-end devices equipped with dedicated processors, CTC with modern architectures (e.g., Transformer [18,19] and Conformer [20,21]) allows fast parallel inference, coming from the non-autoregressive property of CTC. Also, while CTC has been regarded weaker than encoder-decoder, various ways to improve CTC, including pretraining [12] and regularization [22,23], have been developed. Also, there are active researches on CTC variants [12] and on non-autoregressive modeling based on CTC [24,25,26], suggesting that effective pruning for CTC can also be applied to other variants or non-autoregressive models.…”