Successful speech perception requires that listeners map the acoustic signal to linguistic categories. These mappings are not only probabilistic, but change depending on the situation. For example, one talker’s /p/ might be physically indistinguishable from another talker’s /b/ (cf. lack of invariance). We characterize the computational problem posed by such a subjectively non-stationary world and propose that the speech perception system overcomes this challenge by (1) recognizing previously encountered situations, (2) generalizing to other situations based on previous similar experience, and (3) adapting to novel situations. We formalize this proposal in the ideal adapter framework: (1) to (3) can be understood as inference under uncertainty about the appropriate generative model for the current talker, thereby facilitating robust speech perception despite the lack of invariance. We focus on two critical aspects of the ideal adapter. First, in situations that clearly deviate from previous experience, listeners need to adapt. We develop a distributional (belief-updating) learning model of incremental adaptation. The model provides a good fit against known and novel phonetic adaptation data, including perceptual recalibration and selective adaptation. Second, robust speech recognition requires listeners learn to represent the structured component of cross-situation variability in the speech signal. We discuss how these two aspects of the ideal adapter provide a unifying explanation for adaptation, talker-specificity, and generalization across talkers and groups of talkers (e.g., accents and dialects). The ideal adapter provides a guiding framework for future investigations into speech perception and adaptation, and more broadly language comprehension.
Two visual-world experiments examined listeners’ use of pre word-onset anticipatory coarticulation in spoken-word recognition. Experiment 1 established the shortest lag with which information in the speech signal influences eye-movement control, using stimuli such as “The … ladder is the target”. With a neutral token of the definite article preceding the target word, saccades to the referent were not more likely than saccades to an unrelated distractor until 200–240 ms after the onset of the target word. In Experiment 2, utterances contained definite articles which contained natural anticipatory coarticulation pertaining to the onset of the target word (“ The ladder … is the target”). A simple Gaussian classifier was able to predict the initial sound of the upcoming target word from formant information from the first few pitch periods of the article’s vowel. With these stimuli, effects of speech on eye-movement control began about 70 ms earlier than in Experiment 1, suggesting rapid use of anticipatory coarticulation. The results are interpreted as support for “data explanation” approaches to spoken-word recognition. Methodological implications for visual-world studies are also discussed.
We present a framework of second and additional language (L2/Ln) acquisition motivated by recent work on socio-indexical knowledge in first language (L1) processing. The distribution of linguistic categories covaries with socio-indexical variables (e.g., talker identity, gender, dialects). We summarize evidence that implicit probabilistic knowledge of this covariance is critical to L1 processing, and propose that L2/Ln learning uses the same type of socio-indexical information to probabilistically infer latent hierarchical structure over previously learned and new languages. This structure guides the acquisition of new languages based on their inferred place within that hierarchy, and is itself continuously revised based on new input from any language. This proposal unifies L1 processing and L2/Ln acquisition as probabilistic inference under uncertainty over socio-indexical structure. It also offers a new perspective on crosslinguistic influences during L2/Ln learning, accommodating gradient and continued transfer (both negative and positive) from previously learned to novel languages, and vice versa.
Social and linguistic perceptions are linked. On one hand, talker identity affects speech perception. On the other hand, speech itself provides information about a talker's identity. Here, we propose that the same probabilistic knowledge might underlie both socially conditioned linguistic inferences and linguistically conditioned social inferences. Our computational-level approach-the ideal adapter-starts from the idea that listeners use probabilistic knowledge of covariation between social, linguistic, and acoustic cues in order to infer the most likely explanation of the speech signals they hear. As a first step toward understanding social inferences in this framework, we use a simple ideal observer model to show that it would be possible to infer aspects of a talker's identity using cue distributions based on actual speech production data. This suggests the possibility of a single formal framework for social and linguistic inferences and the interactions between them.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.