“…These samples were composed of 500–3216 sagittal slices per run, of which 113–2808 were analysed after discarding time points that were unlikely to be of interest (i.e., retaining time points in which audible sound was produced by the participant as identified by synchronised audio data). These samples covered a range of speech and non-speech behaviours across a range of research disciplines (see Table 1 ), including spoken monosyllables in British English, connected speech in German (Carignan et al, 2020 ), French (Isaieva et al, 2021 ), American English as spoken by a native (L1) speaker (Narayanan et al, 2014 ), and American English spoken by a non-native speaker (Lim et al, 2021 ), as well as non-speech vocal behaviours including vocal size exaggeration (Belyk et al, 2022 ), laughter (Belyk & McGettigan, 2022 ), and whistling (Belyk et al, 2019 ). This sample reflects the natural variation in imaging parameters, and correspondingly in image quality, that analysts may face in practical application (see Fig.…”