“…For a fair comparison, Table 6 only includes models that use the same baseline training strategy as ours. Thus, we exclude approaches that depend on other models for expansion [25,33,51], costly training techniques such as knowledge distillation [9,17,18,38,41,44], or special pretraining [11,20,34] (see Table 8 for more comparisons).…”