“…The loss is calculated from variable assignments out at each step, and the sum of all losses is minimized. Using the loss at each time-step has shown performance improvements [Palm et al, 2017, Amizadeh et al, 2019b, Ozolin , š et al, 2020 versus a single loss calculation at the end. Also, it enables using many more steps in evaluation than in training.…”