We present numerical methods for studying the relationship between the shape of the vocal tract and its acoustic output. For a stationary vocal tract, the articulatory-acoustic relationship can be represented as a multidimensional function of a multidimensional argument: y=f(x), where x, y are vectors describing the vocal-tract shape and the resulting acoustic output, respectively. Assuming that y may be computed for any x, we develop a procedure for inverting f(x). Inversion by computer sorting consists of computing y for many values of x and sorting the resulting (y,x) pairs into a convenient order according to y; x for a given y is then obtained by looking up y in the sorted data. Application of this method for determining parameters of an articulatory model corresponding to a given set of formant frequencies is presented. A method is also described for finding articulatory regions (fibers) which map into a single point in the acoustic space. The local nature of f(x) is determined by linearization in a small neighborhood. Larger regions are explored by extending the linear neighborhoods in small steps. This method was applied for the study of compensatory articulation. Sounds produced by various articulations along a fiber were synthesized and were compared by informal listening tests. These tests show that, in many cases of interest, a given sound could be produced by many different vocal-tract shapes.
We present a method of constructing various vocal-tract shapes having identical acoustical characteristics. A 20-dimensional vocal tract model is used to compute the frequencies, amplitudes, and bandwidths of the first five formants. The dimensions consist of 20 uniformly spaced cross-sectional areas of the vocal tract. The model includes losses due to finite wall impedance, glottal leakage, friction, heat conduction, and radiation. Articulatory regions (fibers) are computed in which the frequencies and amplitudes of the first three formants remain constant. In general these fibers contain 14 dimensions, not all of which provide physiologically reasonable area profiles. A theory for selecting among the dimensions within a fiber according to maximum physical smoothness of the vocal tract is presented. Application of the theory to the 14-dimensional data indicate that 2 or 3 of the 14 dimensions correspond to physically realizable perturbations in the vocal tract and that along these dimensions changes in the vocal-tract shape may be made without changing the three formant frequencies and amplitudes. Almost no audible change in the sound is produced by these articulatory variations. Examples of synthetic speech illustrating these results will be played.
A method is presented for determining the parameters of an articulatory model corresponding to a given set of formant frequencies. A four-parameter articulatory model of the vocal tract was used to define the vocal-tract shape. The articulatory parameters were (a) location of constriction, (b) area of constriction, (c) area of mouth opening, and (d) length of the vocal tract. The model included losses due to finite wall impedance, glottal leakage, friction, heat conduction and radiation. Frequencies, amplitudes, and bandwidths of five formants were computed for each of 30 720 vocal-tract shapes obtained by uniform sampling along each of the four articulatory dimensions. The data was sorted according to the first three formant frequencies. The articulator positions yielding given values of these formant frequencies can be looked up in the sorted data. One-dimensional fibers (curved lines) were computed along which the articulator positions can vary without producing changes in the three formant frequencies. These fibers span large distances in the articulatory space. Sounds produced by various articulations along a fiber were computed and were compared by informal listening tests.
In this study, a computationally efficient articulatory synthesizer that utilizes the popular analog circuit simulator SPICE is developed. The synthesizer uses a transmission-line analog model of the vocal tract. An analog model has many advantages over digital representations: (1) Side branches (needed for modeling nasals and /l/) can be simulated easily by additional transmission lines in parallel; (2) drive-dependent sources, at any location, could be added; and (3) the number of sections can be varied without changing the sampling rate, as is the case with a digital synthesizer. A computer interface, using MATLAB, is developed such that the input to the synthesizer can be specified in terms of the area function of the vocal tract and the type and location of dependent or independent sources (voltage or current.) By simulating the transfer function of the vocal tract, transient and steady-state responses are generated. Using Fant’s vowel area functions (1960), vowels were synthesized with their first four formant frequencies almost identical to those given by Fant. The feasibility of implementing the analog synthesizer using modern ICs, such as the gyrator-based inductance simulator and switched capacitor filter circuits, is assessed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.