“…For example, Seq2Sick (Cheng et al, 2020) generates adversarial examples that decrease the BLUE score of neural machine translation models. In addition to accuracy, inference efficiency is also highly critical for various real-time applications, e.g., speech recognition (Wang et al, 2022), machine translation (Fan et al, 2021;Zhu et al, 2020), lyric transcriptions (Gao et al, 2022b(Gao et al, , 2023(Gao et al, , 2022a. Recently, NICGSlowDown and NMT-Sloth (Chen et al, 2022d,c) propose delaying the appearance of the end token to reduce the efficiency of language generative models.…”