Purpose The language that children hear early in life is associated with their speech-language outcomes. This line of research relies on naturalistic observations of children's language input, often captured with daylong audio recordings. However, the large quantity of data that daylong recordings generate requires novel analytical tools to feasibly parse thousands of hours of naturalistic speech. This study outlines a new approach to efficiently process and sample from daylong audio recordings made in two bilingual communities, Spanish–English in the United States and Quechua–Spanish in Bolivia, to derive estimates of children's language exposure. Method We employed a general sampling with replacement technique to efficiently estimate two key elements of children's early language environments: (a) proportion of child-directed speech (CDS) and (b) dual language exposure. Proportions estimated from random sampling of 30-s segments were compared to those from annotations over the entire daylong recording (every other segment), as well as parental report of dual language exposure. Results Results showed that approximately 49 min from each recording or just 7% of the overall recording was required to reach a stable proportion of CDS and bilingual exposure. In both speech communities, strong correlations were found between bilingual language estimates made using random sampling and all-day annotation techniques. A strong association was additionally found for CDS estimates in the United States, but this was weaker at the Bolivian site, where CDS was less frequent. Dual language estimates from the audio recordings did not correspond well to estimates derived from parental report collected months apart. Conclusions Daylong recordings offer tremendous insight into children's daily language experiences, but they will not become widely used in developmental research until data processing and annotation time substantially decrease. We show that annotation based on random sampling is a promising approach to efficiently estimate ambient characteristics from daylong recordings that cannot currently be estimated via automated methods.
Purpose: The language that children hear early in life predicts their later speech-language outcomes (Hoff 2003; Weisleder & Fernald 2013). This line of research relies on naturalistic observations of children’s language input, often captured with daylong audio recordings. But the large quantity of data that daylong recordings generate requires novel analytical tools to feasibly parse thousands of hours of naturalistic speech. This study outlines a workflow to efficiently process and sample from daylong audio recordings made in two bilingual communities:Spanish-English in the United States and Quechua-Spanish in Bolivia.Method: We employed a general sampling with replacement technique to efficiently estimate two key elements of children’s early language environments: 1) proportion of child-directed speech and 2) dual language exposure. Proportions estimated from random sampling of 30-second segments were compared to those from annotations over the entire daylong recording (every-other-segment), as well as parental report.Results: Results showed that approximately 49 minutes from each recording, or just 7% of the overall recording, were required to reach a stable proportion of child-directed speech and bilingual exposure. In both speech communities, strong correlations were found between bilingual language estimates made using random sampling and all-day annotation techniques. A strong relationship was additionally found for child-directed speech estimates in the United States, but this was weaker at the Bolivian site, where child-directed speech was less frequent. Furthermore, dual language estimates from the daylong audio recordings did not correspond to estimates derived from parental report.Conclusions: Random sampling is a valid method to estimate ambient characteristics from daylong recordings. However, caution should be taken when interpreting estimates of low-frequency categories and practitioners might consider collecting multiple daylong recordings to accurately estimate characteristics of children’s language exposure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.