“…6,7,8] exploited transfer learning from ASR and MT showing, for instance, that pre-training the ST encoder on ASR data can yield significant improvements. On the data side, the most promising approach is data augmentation, which has been experimented via knowledge distillation from a neural MT (NMT) model [9], synthesizing monolingual MT data in the source language [10], multilingual training [11], or translating monolingual ASR data into the target language [10,12,13]. Nevertheless, despite some claims of big industrial players operating in rich data conditions [10], top results at recent shared tasks [13] show that effectively exploiting the scarce training data available still remains a crucial issue to reduce the performance gap with cascade ST solutions.…”