“…Since it is difficult to train an end-to-end ST model directly, some training techniques like pretraining (Weiss et al, 2017;Berard et al, 2018;Bansal et al, 2019;Stoian et al, 2020;Wang et al, 2020b;Dong et al, 2021a;Alinejad and Sarkar, 2020;Zheng et al, 2021b;, multi-task learning (Le et al, 2020;Vydana et al, 2021;Tang et al, 2021b;Ye et al, 2021;Tang et al, 2021a), curriculum learning (Kano et al, 2017;Wang et al, 2020c), and meta-learning (Indurthi et al, 2020) have been applied. Recent work has introduced mixup on machine translation (Zhang et al, 2019b;Guo et al, 2022;Fang and Feng, 2022), sentence classification (Chen et al, 2020;Jindal et al, 2020;Sun et al, 2020), multilingual understanding , and speech recognition (Medennikov et al, 2018;Sun et al, 2021;Lam et al, 2021a;Meng et al, 2021), and obtained enhancements.…”