“…We evaluated our approach on various tasks, including long sequence ListOps (Nangia and Bowman, 2018 ), byte-level text classification (Maas et al, 2011 ), document retrieval using ACL selection networks (Radev et al, 2013 ), and Pathfinder (Linsley et al, 2018 ). While comparing with our Kerformer model with Local Attention (Tay et al, 2020 ), Reformer (Kitaev et al, 2020 ), Performer (Choromanski et al, 2020 ), Longformer (Choromanski et al, 2020 ), Transformer (Vaswani et al, 2017 ), BigBird (Zaheer et al, 2020 ), and Dct-former (Scribano et al, 2023 ) models, the comparison results of the seven different models are shown in Table 5 . As shown in Table 5 , Kerformer obtained the best performance in ListOps, Document Retrieval, while Kerformer also achieved competitive results in the other two tasks, and finally Kerformer achieved the next best score in overall task average accuracy.…”