“…First, we used the automatic utterance boundaries provided by the LENA software (“A,” short for “automatic boundaries”), as well as combined together the text from segments labeled as continuations of each other by coders (“H” for “human boundaries”). Second, since performance is dependent on corpus size (see Bernard et al, 2018), we had three versions of each CDS corpus: the full one, a shortened CDS corpus to match the ADS corpus in number of words, and a shortened CDS corpus to match the ADS corpus in number of utterances. After crossing these two factors, performance could be compared between, on the one hand, ADS-A/H (ADS with automatic or human utterance boundaries), and, on the other hand, one of (1) CDS-A/H-full (corresponding full CDS corpus), (2) CDS-A/H-WM (cut at the same number of word tokens found in the corresponding ADS), or (3) CDS-A/H-UM (cut at the same number of utterances).…”