“…Such remarks can also be made about the many probabilistic or non-probabilistic bottom-up, topdown, or left corner parsing algorithms which have been studied over the years as models of sentence processing (Earley, 1970;Rosenkrantz and Lewis, 1970;Marcus, 1978;Abney and Johnson, 1991;Berwick and Weinberg, 1982;Roark, 2001;Nivre, 2008;Stabler, 2013;Graf et al, 2017). Likewise for transformer or RNNbased parsing models (e.g., Costa, 2003;Jin and Schuler, 2020;Yang and Deng, 2020;Hu et al, 2021Hu et al, , 2022 or causal language models (Hochreiter and Schmidhuber, 1997;Radford et al, 2018Radford et al, , 2019Dai et al, 2019;Brown et al, 2020). The amount of work required by these algorithms to integrate or predict the next word scales in quantities such as the size of the vocabulary and the length of the input, but never directly as a function of the probability of the next word.…”