There is considerable debate about whether the early processing of sounds depends on whether they form part of speech. Proponents of such speech specificity postulate the existence of language-dependent memory traces, which are activated in the processing of speech but not when equally complex, acoustic non-speech stimuli are processed. Here we report the existence of these traces in the human brain. We presented to Finnish subjects the Finnish phoneme prototype /e/ as the frequent stimulus, and other Finnish phoneme prototypes or a non-prototype (the Estonian prototype /õ/) as the infrequent stimulus. We found that the brain's automatic change-detection response, reflected electrically as the mismatch negativity (MMN), was enhanced when the infrequent, deviant stimulus was a prototype (the Finnish /ö/) relative to when it was a non-prototype (the Estonian /õ/). These phonemic traces, revealed by MMN, are language-specific, as /õ/ caused enhancement of MMN in Estonians. Whole-head magnetic recordings located the source of this native-language, phoneme-related response enhancement, and thus the language-specific memory traces, in the auditory cortex of the left hemisphere.
This paper describes an HMM-based speech synthesizer that utilizes glottal inverse filtering for generating natural sounding synthetic speech. In the proposed method, speech is first decomposed into the glottal source signal and the model of the vocal tract filter through glottal inverse filtering, and thus parametrized into excitation and spectral features. The source and filter features are modeled individually in the framework of HMM and generated in the synthesis stage according to the text input. The glottal excitation is synthesized through interpolating and concatenating natural glottal flow pulses, and the excitation signal is further modified according to the spectrum of the desired voice source characteristics. Speech is synthesized by filtering the reconstructed source signal with the vocal tract filter. Experiments show that the proposed system is capable of generating natural sounding speech, and the quality is clearly better compared to two HMM-based speech synthesis systems based on widely used vocoder techniques. Index Terms-Speech synthesis, glottal inverse filtering, hidden Markov model. EDICS Category: SPE-SYNT I. INTRODUCTION T HE ultimate goal of speech synthesis is to create natural sounding spoken expression from arbitrary text. This calls for the ability to synthesize high quality speech, but also provides a means to involve the appropriate variation of the speech characteristics according to the speaker, context, and emotion. The first criterion can be met with a synthesis scheme that concatenates segments of pre-recorded speech. However, these so-called unit selection-based systems are known to suffer from limitations in their ability to vary the speech characteristics [1]. Hidden Markov model (HMM)-based parametric speech synthesis techniques [1]-[4], in turn, are very flexible and can be adapted [5] or modified [6] to generate speech according to virtually any criterion related to varying Manuscript received Month Day, Year; revised Month Day, Year. This project is supported by Nokia and the Academy of Finland (projects 111848, 107606). J. Yamagishi is funded by the Engineering and Physical Sciences Research Council (EPSRC) and the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement 213845 (the EMIME project). The associate editor coordinating the review of this manuscript for publication was Prof.
The present study was motivated by a theory, which proposes that speech includes articulatory gestures that are connected to particular hand actions. We hypothesized that certain articulatory gestures would be more associated with the precision grip than with the power grip, and vice versa. In the study, the participants pronounced a syllable and performed simultaneously a precision or power grip that was theorized to be either congruent or incongruent with the syllable. Relatively fast precision grip responses were associated with articulatory gestures in which the tip of the tongue contacted the alveolar ridge ([te]) or the aperture of the vocal tract remained small ([hi]), as well as gestures that required lip protrusion ([pu]). In contrast, relatively fast power grip responses were associated with gestures that were produced by moving the back of the tongue against the velum ([ke]) or in which the aperture of the vocal tract remained large ([hα]). In addition to demonstrating that certain articulatory gestures are systematically connected to different grip types, the study may shed some light on discussion concerning sound symbolism and evolution of speech.
A unique feature of human communication system is our ability to rapidly acquire new words and build large vocabularies. However, its neurobiological foundations remain largely unknown. In an electrophysiological study optimally designed to probe this rapid formation of new word memory circuits, we employed acoustically controlled novel word-forms incorporating native and non-native speech sounds, while manipulating the subjects' attention on the input. We found a robust index of neurolexical memory-trace formation: a rapid enhancement of the brain's activation elicited by novel words during a short (~30min) perceptual exposure, underpinned by fronto-temporal cortical networks, and, importantly, correlated with behavioural learning outcomes. Crucially, this neural memory trace build-up took place regardless of focused attention on the input or any pre-existing or learnt semantics. Furthermore, it was found only for stimuli with native-language phonology, but not for acoustically closely matching non-native words. These findings demonstrate a specialised cortical mechanism for rapid, automatic and phonology-dependent formation of neural word memory circuits.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.