The INTERSPEECH 2019 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the Styrian Dialects Sub-Challenge, three types of Austrian-German dialects have to be classified; in the Continuous Sleepiness Sub-Challenge, the sleepiness of a speaker has to be assessed as regression problem; in the Baby Sound Sub-Challenge, five types of infant sounds have to be classified; and in the Orca Activity Sub-Challenge, orca sounds have to be detected. We describe the Sub-Challenges and baseline feature extraction and classifiers, which include data-learnt (supervised) feature representations by the 'usual' ComParE and BoAW features, and deep unsupervised representation learning using the AUDEEP toolkit.
Recent advances in large-scale data storage and processing offer unprecedented opportunities for behavioral scientists to collect and analyze naturalistic data, including from underrepresented groups. Audio data, particularly real-world audio recordings, are of particular interest to behavioral scientists because they provide high-fidelity access to subtle aspects of daily life and social interactions. However, these methodological advances pose novel risks to research participants and communities. In this article, we outline the benefits and challenges associated with collecting, analyzing, and sharing multi-hour audio recording data. Guided by the principles of autonomy, privacy, beneficence, and justice, we propose a set of ethical guidelines for the use of longform audio recordings in behavioral research. This article is also accompanied by an Open Science Framework Ethics Repository that includes informed consent resources such as frequent participant concerns and sample consent forms.
This study evaluates whether early vocalizations develop in similar ways in children across diverse cultural contexts. We analyze data from daylong audio-recordings of 49 children (1-36 months) from five different language/cultural backgrounds. Citizen scientists annotated these recordings to determine if child vocalizations contained canonical transitions or not (e.g., "ba" versus "ee").Results revealed that the proportion of clips reported to contain canonical transitions increased with age. Further, this proportion exceeded 0.15 by around 7 months, replicating and extending previous findings on canonical vocalization development but using data from the natural environments of a culturally and linguistically diverse sample. This work explores how crowdsourcing can be used to annotate corpora, helping establish developmental milestones relevant to multiple languages and cultures. Lower inter-annotator reliability on the crowdsourcing platform, relative to more traditional in-lab expert annotators, means that a larger number of unique annotators and/or annotations are required and that crowdsourcing may not be a suitable method for more fine-grained annotation decisions. Audio clips used for this project are compiled into a large-scale infant vocal corpus that is available for other researchers to use in future work.
This study evaluates whether babbling emerges similarly in children across diverse cultural contexts. We analyze data from daylong audio-recordings of 52 children (1-36 months) from six different language/cultural backgrounds. Citizen scientists annotated these recordings to determine if child vocalizations were canonical or not (e.g., "ba" versus "ee"). Results revealed that canonical babble increased with age. Further, a 0.15 canonical babble ratio emerged around 7 months, replicating and extending previous findings with data from the natural environments of a culturally and linguistically diverse sample. This work exemplifies how crowdsourcing can be used to annotate corpora, helping establish developmental milestones relevant to multiple languages and cultures. Audio clips used for this project are compiled into a large-scale infant babble corpus that is available for other researchers to use in future work.
This study examined a potential lexicality advantage in young children's early speech production: do children produce sound sequences less accurately in nonwords than real words? Children aged 3;3-4;4 completed two tasks: a real word repetition task and a corresponding nonword repetition task. Each of the 23 real words had a paired consonant-vowel sequence in the nonword in word-initial position (e.g., ‘su’ in [ˈsutkes] ‘suitcase’ and [ˈsudrɑs]). The word-initial consonant-vowel sequences were kept constant between the paired words. Previous work on this topic compared different sequences of paired sounds, making it hard to determine if those results were due to a lexical or phonetic effect. Our results show that children reliably produced consonant-vowel sequences in real words more accurately than nonwords. The effect was most pronounced in children with smaller receptive vocabularies. Together, these results reinforce theories arguing for interactions between vocabulary size and phonology in language development.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.