“…The most commonly used algorithm, part of the Language ENvironment Analysis (LENA) system, was trained on data from English-learning infants and young children (up to four years old), but in recent years has been used with a much wider range of ages and languages (see Ganek and Eriks-Brophy 2018 for a review). More recently, open-sourced alternatives to LENA have been developed by members of the ACLEW project (Lavechin, Bousbib, Bredin, Dupoux, & Cristia, 2021;Räsänen et al, 2019;Räsänen, Seshadri, Lavechin, Cristia, & Casillas, 2020), including a system to identify speakers and another to count words, syllables, and phones, all trained LONG-FORM RECORDINGS 7 on multilingual datasets (henceforth, the ACLEW pipeline). Independent assessments comparing automated annotations against human ones suggest that the LENA and ACLEW algorithm accuracy varies widely across participants (even across English-speaking participants: Cristia, Lavechin, et al 2020;Lehet, Arjmandi, Dilley, and Houston 2020;Räsänen et al 2020).…”