“…Transformer base (Vaswani et al, 2017) † 27.30 -Transformer base (Vaswani et al, 2017) 28.10 25.36 + Freq-Exponential (Gu et al, 2020) 28.43 (+0.33) 24.99 (-0.37) + Freq-Chi-Square (Gu et al, 2020) 28.47 (+0.37) 25.43 (+0.07) + BMI-adaptive (Xu et al, 2021) 28.56 (+0.45) 25.77 (+0.41) + Focal Loss (Lin et al, 2017) 28.43 (+0.33) 25.37 (+0.01) + Anti-Focal Loss (Raunak et al, 2020) 28.65 (+0.55) 25.50 (+0.14) + Self-Paced Learning (Wan et al, 2020) 28.69 (+0.59) 25.75 (+0.39) + Simple Fusion (Stahlberg et al, 2018) 27.82 (-0.28) 23.91 (-1.45) + LM Prior (Baziotis et al, 2020) 28 (Vaswani et al, 2017) 29.31 25.48 + Freq-Exponential (Gu et al, 2020) 29.66 (+0.35) 25.57 (+0.09) + Freq-Chi-Square (Gu et al, 2020) 29.64 (+0.33) 25.64 (+0.14) + BMI-adaptive (Xu et al, 2021) 29.69 (+0.38) 25.81 (+0.33) + Focal Loss (Lin et al, 2017) 29.65 (+0.34) 25.54 (+0.06) + Anti-Focal Loss (Raunak et al, 2020) 29.72 (+0.41) 25.64 (+0.16) + Self-Paced Learning (Wan et al, 2020) 29 9) and ( 12). we fix scale s to 0.3 and tune scale t in a similar way.…”