“…For technical reasons, we are limited to calculating mutual information based on the joint frequencies of part-of-speech pairs, rather than wordforms. The reason we use part-of-speech tags is that getting a reliable estimate of mutual information from observed frequencies of wordforms is statistically difficult, requiring very large samples to overcome bias (Archer, Park, & Pillow, 2013;Basharin, 1959;Bentz, Alikaniotis, Cysouw, & Ferrer-i-Cancho, 2017;Futrell et al, 2019;Miller, 1955;Paninski, 2003). The mutual information estimation problem is less severe, however, when we are looking at joint counts over coarser-grained categories, such that there is not a long tail of one-off forms.…”