“…In working memory implementation (Sagirova & Burtsev, 2022), memory is represented by M additional tokens in the decoder input. The Transformer decoder generates, stores, and retrieves M working memory tokens in the same way it predicts the translation sequence.…”