Cognitive systems face a tension between stability and plasticity. The maintenance of long‐term representations that reflect the global regularities of the environment is often at odds with pressure to flexibly adjust to short‐term input regularities that may deviate from the norm. This tension is abundantly clear in speech communication when talkers with accents or dialects produce input that deviates from a listener's language community norms. Prior research demonstrates that when bottom‐up acoustic information or top‐down word knowledge is available to disambiguate speech input, there is short‐term adaptive plasticity such that subsequent speech perception is shifted even in the absence of the disambiguating information. Although such effects are well‐documented, it is not yet known whether bottom‐up and top‐down resolution of ambiguity may operate through common processes, or how these information sources may interact in guiding the adaptive plasticity of speech perception. The present study investigates the joint contributions of bottom‐up information from the acoustic signal and top‐down information from lexical knowledge in the adaptive plasticity of speech categorization according to short‐term input regularities. The results implicate speech category activation, whether from top‐down or bottom‐up sources, in driving rapid adjustment of listeners' reliance on acoustic dimensions in speech categorization. Broadly, this pattern of perception is consistent with dynamic mapping of input to category representations that is flexibly tuned according to interactive processing accommodating both lexical knowledge and idiosyncrasies of the acoustic input.
Speech perception presents an exemplary model of how neurobiological systems flexibly adjust when input departs from the norm. Dialects, accents, and even head colds can negatively impact comprehension by shifting speech from listeners' expectations. Comprehension improves with exposure to shifted speech regularities, but there is no neurobiological model of this rapid learning. We used electroencephalography to examine human auditory cortical responses to utterances that varied only in fundamental frequency (F0, perceived as voice pitch) as we manipulated the statistical distributions of speech acoustics across listening contexts. Participants overtly categorized speech sampled across two acoustic dimensions that signal /b/ from /p/ (voice onset time [VOT] and F0) to model typical English speech regularities or an expectation-violating accent. These blocks were interleaved with passive exposure to two F0-distinguished test stimuli presented in an oddball ratio to elicit a cortical mismatch negativity (MMN) response. F0 robustly influenced speech categorization when short-term regularities aligned with English but F0 exerted no influence in the context of the accent. The short-term regularities modulated event-related potentials evoked by F0-distinguished test stimuli across both N1 and P3 temporal windows and, for P3 amplitude, there was a strong correlation with perceptual down-weighting of F0. The influence of the short-term regularities persisted to impact MMN in interleaved passive listening blocks when regularities mirrored English but were absent when regularities conveyed the accent. Thus, cortical response is modulated as a function of statistical regularities of the listening context, likely reflecting both early dimension encoding and later categorization.
Speech learning involves discovering appropriate functional speech units (e.g., speech categories) embedded in a continuous stream of speech. However, speech category learning has been mostly investigated with isolated sound tokens. Here, we used a videogame to encourage incidental learning of speech categories from continuous speech input (Lim et al., 2015). Native English participants (N = 17) played the videogame while listening to acoustically-variable continuous Mandarin sentences. Unbeknownst to participants, four acoustically-variable Mandarin keywords were embedded in the continuous sentences. During training, participants were not informed about the keywords, made no overt categorization decisions, and received no feedback. Participants' post-training categorization test demonstrated robust incidental learning of keyword that persisted even 10 days after training, and generalized to novel utterances and talkers. Further, the N100 response in the frontal EEG site evoked by keyword onsets within continuous Mandarin speech during passive listening was greater post-training compared to pre-training. This neural enhancement was specific to the Mandarin keywords functionally useful in videogame training. Our results demonstrate that although participants were not informed about the keywords, they did not make overt categorization decisions during videogame play, they incidentally learn functionally-relevant non-native speech categories from continuous speech input across considerable acoustic variability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.