“…The importance of word order in LMs has been a topic of debate, with various works claiming that downstream performance is not affected by scrambled inputs (Malkin et al, 2021;Sinha et al, 2021), although it has been shown that LMs are able to retain a notion of word order through their positional embeddings (Abdou et al, 2022). It has been argued that LMs acquire an abstract notion of word order that goes beyond mere n-gram co-occurrence statistics (Futrell and Levy, 2019;Kuribayashi et al, 2020;Merrill et al, 2024), a claim that we in this paper assess for large-scale LMs in the context of adjective order. Finally, numerous works have investigated the trade-off between memorization and generalization in LMs: it has been shown that larger LMs are able to memorize entire passages from the training data (Biderman et al, 2023a;Lesci et al, 2024;Prashanth et al, 2024), but generalization patterns for grammatical phenomena have also been shown to follow human-like generalization (Dankers et al, 2021;Hupkes et al, 2023;Alhama et al, 2023).…”