“…On the other hand, structured pruning (Xia et al, 2022;Kwon et al, 2022;Kurtic et al, 2023), prunes away structures such as neurons, weight matrix blocks, or layers. Most previous works on structured pruning have focused on encoder-based models (Xia et al, 2022;Kwon et al, 2022;Kurtic et al, 2023), which remove attention heads, columns, and rows of weight matrices using different importance score metrics, including magnitudes or Hessians of weight matrices, and L0 loss. However, structured pruning on generative models has been significantly underinvestigated, with only a few available works (Lagunas et al, 2021;Yang et al, 2022;Santacroce et al, 2023).…”