Daily interactions of children with their parents are crucial for spoken language skills and overall development. Capturing such interactions can help to provide meaningful feedback to parents as well as practitioners. Naturalistic audio capture and developing further speech processing pipeline for parent-child interactions is a challenging problem. One of the first important steps in the speech processing pipeline is Speaker Diarization—to identify who spoke when. Speaker Diarization is the method of separating a captured audio stream into analogous segments that are differentiated by the speaker’s (child or parent’s) identity. Following ongoing COVID-19 restrictions and human subjects research IRB protocols, an unsupervised data collection approach was formulated to collect parent-child interactions (of consented families) using LENA device—a light weight audio recorder. Different interaction scenarios were explored: book reading activity at home and spontaneous interactions in a science museum. To identify child’s speech from a parent, we train the Diarization models on open-source adult speech data and children speech data acquired from LDC (Linguistic Data Consortium). Various speaker embeddings (e.g., x-vectors, i-vectors, resnets) will be explored. Results will be reported using Diarization Error Rate. [Work sponsored by NSF via Grant Nos. 1918032 and 1918012.]
Recent developments in deep learning strategies have revolutionized Speech and Language Technologies(SLT). Deep learning models often rely on massive naturalistic datasets to produce the necessary complexity required for generating superior performance. However, most massive SLT datasets are not publicly available, limiting the potential for academic research. Through this work, we showcase the CRSS-UTDallas led efforts to recover, digitize, and openly distribute over 50,000 hrs of speech data recorded during the 12 NASA Apollo manned missions, and outline our continuing efforts to digitize and create meta-data through diarization of the remaining 100,000hrs. We present novel deep learning-based speech processing solutions developed to extract high-level information from this massive dataset. Fearless-Steps APOLLO resource is a 50,000 hrs audio collection from 30-track analog tapes originally used to document Apollo missions 1,7,8,10,11,&13. A customized tape read-head developed to digitize all 30 channels simultaneously has been deployed to expedite digitization of remaining mission tapes. Diarized transcripts for these unlabeled audio communications have also been generated to facilitate open research from speech sciences, historical archives, education, and speech technology communities. Robust technologies developed to generate human-readable transcripts include: (i) speaker diarization, (ii) speaker tracking, and (iii) text output from speech recognition systems.
Apollo-11 was the first manned space mission to successfully bring astronauts to the moon. More than + 400 mission specialists/support team members were involved whose voice communications were captured using the SoundScriber multi-channel analog system. To ensure mission success, it was necessary for teams to engage, communicate, learn, address and solve problems in a timely manner. Hence, in order to identify each speaker’s role during Apollo missions and analyze group communication, we need to automatically tag and track speakers individually since manual annotation is costly and time consuming on a massive audio corpus. In this study, we focus on a subset of 100 h derived from the 10 000 h of the Fearless Steps Apollo-11 audio data. We use the concept of “Where’s Waldo” to identify all instances of our speakers-of-interest: (i) Three Astronauts; (ii) Flight Director; and (iii) Capsule Communicator. Analyzing the handful of speakers present in the small audio dataset of 100 h can be extended to the complete Apollo mission. This analysis provides an opportunity to recognize team communications, group dynamics, and human engagement/psychology. Identifying these personnel can help pay tribute to the hundreds of notable engineers and scientists who made this scientific accomplishment possible. Sponsored by NSF #2016725
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.