“…E2E ST opens the way to bridging the modality gap directly, but it is data-hungry, sample-inefficient and often underperforms cascade models especially in low-resource settings (Bansal et al, 2018). This led researchers to explore solutions ranging from efficient neural architecture design (Karita et al, 2019;Sung et al, 2019) to extra training signal incorporation, including multi-task learning (Weiss et al, 2017;Liu et al, 2019b), submodule pretraining (Bansal et al, 2019;Stoian et al, 2020;Wang et al, 2020), knowledge distillation (Liu et al, 2019a), meta-learning (Indurthi et al, 2019) and data augmentation Jia et al, 2019;Pino et al, 2019). Our work focuses on E2E ST, but we investigate feature selection which has rarely been studied before.…”