“…BERTbased LMs (Devlin et al, 2019) have demonstrated their abilities to encode various linguistic and hierarchical properties (Lin et al, 2019;Jawahar et al, 2019;Jo and Myaeng, 2020) which have a positive effect on the downstream performance (Liu et al, 2019a;Miaschi et al, 2020) and serve as an inspiration for syntax-oriented architecture improvements Bai et al, 2021;Ahmad et al, 2021;Sachan et al, 2021). Besides, a variety of pre-training objectives has been introduced (Liu et al, 2020a), with some of them modeling reconstruction of the perturbed word order (Lewis et al, 2020;Tao et al, 2021;Panda et al, 2021).…”