This paper presents an acoustic study of vowel formant dynamics and the analysis methods that were developed to carry it out. The main goal of the described study was to bring quantitative, acoustic evidence to bear on competing theories regarding the source(s) of vowel identity specification [W. Strange, J. Acoust. Soc. Am. 85, 2081–2087 (1989)]. A set of 40 CVC syllables are studied: symmetric voiced stop (/bVb/, /dVd/, /gVg/) and ‘‘neutral’’ (/hVd/) contexts × the ‘‘monopthongal’’ vowels of Midwestern American English (/i,i,e,eh,æ,Λ,u,u,o,a/). Three male speakers (speaking normally) contributed two repetitions each. Voice pulse by voice pulse tracks of the first three formant frequencies were measured using GEMS [J. Talley, J. Acoust. Soc. Am. 90, 2274 (A) (1991)] and LPC. PEACC, a new technique for speech coding using exponential pieces, was then applied to the trajectories to automatically segment them into transitions and characterize the segments in terms of intuitive parameters−Δf (‘‘locus-to-target’’ distance), Δt (duration), α (curvature), and f0 (‘‘target’’ frequency). This paper discusses the resulting data’s characteristics and the results from analyzing initial and final transitions with respect to intracategory similarity and intercategory distinctiveness using a variety of interesting category boundaries. [Work supported by NSF.]
In the three-decade-long debate over static versus dynamic specification of vowels, perceptual studies in which subjects are tasked with identifying naturally spoken vowels under various ablation conditions have been a mainstay. While not directly producing an understanding of how humans go about recognizing this major subclass of phones, this type of study [e.g., Strange, Jenkins, and Johnson, J. Acoust. Soc. Am. 74, 695–705 (1983)] has provided compelling results which must be accounted for in any successful theory of vowel perception. This paper presents results from yet another perceptual study of human vowel identification under ablation conditions. This study uses CVC syllables spoken rapidly by three male speakers in a carrier sentence. Syllables consist of ten American English vowels in each of four consonantal contexts (b—b, d—d, g—g, and h—d). The conditions studied are silent centers (SC), centers only (CO), and the control condition (full). A very robust hierarchy of full>CO>SC is found. Consonantal contexts also have a clear ordering (h—d>b—b>d—d>g—g) with respect to the ease with which they are perceived. Interesting interactions between vowels and their contexts are also evident.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.