“…Our work and recent independent efforts (Weerts et al, 2021) suggest that, despite some predictive accuracy in neuroimaging studies (Millet & King, 2021;Kell, Yamins, Shook, Norman-Haignere, & McDermott, 2018) (but see, (Thompson, Bengio, & Schoenwiesner, 2019)), automatic speech recognition systems and humans diverge substantially in various perceptual domains. Our results further suggest that, far from being simply quantitative (e.g., receptive field sizes), these shortcomings are likely qualitative (e.g., lack of flexibility in task performance through exploiting alternative spectrotemporal scales) and would not be solved by such strategies as introducing different training regimens or increasing the models' capacity.…”