“…To date, there are few -if any -Jackson or Berstein-type results for sequence modelling using the Transformer. We mention a related series of works on static function approximation with a variant of the Transformer architecture [1,48,49]. Here, the targets are continuous functions H : [0, 1] τ → K, and K ⊂ R n is a compact set.…”