“…It shows how the minimum and maximum values of input of softmax vary as the training proceeds. The value increases in the order of two, which cannot be handled by previous approximation methods (Lee et al, 2022b;Hong et al, 2022;Jin et al, 2020). This shows that, with the previous approximation methods, it is hard to train a model as much as we want.…”