“…Among the new problems, a noteworthy element is the inference efficiency: the comparison with the standard methods -which typically rely on models of limited size (100-300M parameters) -should account for this aspect, which is critical for social, economic, and environmental reasons (Strubell et al, 2019). Along this line, important research directions include i) pruning the LLM (and possibly the SFM) in a task-aware manner Zhu et al, 2023b;Dery et al, 2024), ii) dynamic layer selection during decoding (Xin et al, 2020;Geva et al, 2022;Xia et al, 2024), and iii) efficient decoding strategies (Stern et al, 2018;Chen et al, 2023a;Leviathan et al, 2023;Santilli et al, 2023). In addition, the speech source contains a wide range of information that can be exploited depending on the paradigm used (e.g., prosody is not handled by cascade systems -Zhou et al 2024).…”