“…Transformers. There is a long line of work investigating the capabilities [Vaswani et al, 2017, Dehghani et al, 2018, Yun et al, 2019, Pérez et al, 2019, Yao et al, 2021, Bhattamishra et al, 2020b, Zhang et al, 2022, limitations [Hahn, 2020, Bhattamishra et al, 2020a, applications [Lu et al, 2021a, Dosovitskiy et al, 2020, Parmar et al, 2018, and internal workings [Elhage et al, 2021, Snell et al, 2021, Weiss et al, 2021, Edelman et al, 2022, Olsson et al, 2022 of Transformer models. Most similar to our work, Müller et al [2021] introduce a "Prior-data fitted transformer network" that is trained to approximate Bayesian inference and generate predictions for downstream learning problems.…”