“…Furthermore, recent studies have suggested utilizing recent advances in white-box model interpretability (Geva et al, 2022b;Li et al, 2022;Mallen et al, 2022;Mickus et al, 2022;Meng et al, 2023;Geva et al, 2023) and probing (Adi et al, 2017;Conneau et al, 2018;Voita et al, 2019;Slobodkin et al, 2021) for manipulating the model predictions and analyzing when LLMs struggle to answer questions. Recent works also tried to use beam search decoding to manipulate the generated outputs by using the information encapsulated in several beams (Meister et al, 2020;Leblond et al, 2021;Slobodkin et al, 2023;Wan et al, 2023b). Finally, early exiting in language models (Schwartz et al, 2020;Schuster et al, 2022;Din et al, 2023) and model prediction calibration (Desai and Durrett, 2020;Jiang et al, 2021;Dhuliawala et al, 2022;Geva et al, 2022a) are strongly related to our work, as they suggest to analyze and improve the model predictions and output distribution.…”