Man could not perceive speech well if each phoneme were cued by a unit sound. In fact, many phonemes are encoded so that a single acoustic cue carries information in parallel about successive phonemic segments. This reduces the rate at which discrete sounds must be perceived, but at the price of a complex relation between cue and phoneme: cues vary greatly with context, and there are, in these cases, no commutable acoustic segments of phonemic size. Phoneme perception therefore requires a special decoder. A possible model supposes that the encoding occurs below the level of the (invariant) neuromotor commands to the articulatory muscles. The decoder may then identify phonemes by referring the incoming speech sounds to those commands.
Synthetic methods applied to isolated syllables have permitted a systematic exploration of the acoustic cues to the perception of some of the consonant sounds. Methods, results, and working hypotheses are discussed.HE program of research on which we are engaged was described in general terms at the preceding Speech Communication Conference. • As we pointed out there, and in more detail in another paper? our work on the perception of speech was based on the assumption that we would have a flexible and convenient experimental method if we could use a spectrographic display to control or manipulate speech sounds. Workers at the Bell Telephone Laboratories had developed the sound spectrograph, which made it instrumentally feasible to obtain spectrograms of relatively long samples. of connected speech, and it had become evident that the spectrographic transform has important advantages over the oscillogram as a way of displaying speech sounds to the eye. We were interested in using the spectrogram, not merely as a representation of speech sounds, but also as a basis for modifying and, in the extreme case, creating them. For that purpose we built a machine called a pattern playback, which converts spectrographic pictures into sound, using either photographic copies of actual spectrograms or, alternatively, "synthetic" patterns which are painted by hand on a cellulose acetate base. Having determined first that the playback would speak quite intelligibly from photographic copies of actual spectrograms, we proceeded to prepare hand-painted patterns of test sentences s which were, by comparison with the original spectrograms, very highly simplified (see Fig. 1). In drawing the hand-painted spectrograms we tried, as the first step, to reproduce as well as we could those aspects of the original pattern which were most apparent to the eye, and then, by working back and forth between hand-painted spectrogram and sound, we modified the patterns, usually by trial and error, until the simplified spectrograms were rather highly intelligible.The work with simplified spectrograms did not provide unequivocal answers to questions about the * This research was made possible in part by funds granted bythe Carnegie Corporation of New York and in part through the support of the Department of Defense in connection with Contract DA49-170-sc-274. minimal and invariant patterns for the various sounds of speech, but it did enable us to develop our techniques, and, further, it suggested certain specific problems which appeared to warrant more systematic investigation. In our research on these problems we have departed from the procedure of progressively simplifying the spectrograms of actual speech and have undertaken instead to study the effects on perception of variations in isolated acoustic elements or patterns. Thus, we can hope to determine the separate contributions to the perception of speech of several acoustic variables and, ultimately, to learn how they can be combined to best effect. STOP CONSONANTS: BURSTS OF NOISEA careful inspect...
Deals with certain misunderstandings on which H. L. Lane (see 39:5) based his criticism of data that bear on a motor theory of speech perception. Lane criticized experiments that had demonstrated contrasting tendencies toward "categorical" perception of stop consonants and "continuous" perception of vowels and nonspeech sounds. He also undertook to demonstrate that categorical perception of nonspeech sounds can be produced by the ordinary procedures of discrimination training, and so to refute the claim that such perception is an interesting characteristic of the speech mode. It is shown that contrary to Lane's claim, discrimination training is not sufficient to produce categorical perception. (34 ref.)
Previous studies with synthetic speech have shown that second-formant transitions are cues for the perception of the stop and nasal consonants. The results of those experiments can be simplified if it is assumed that each consonant has a characteristic and fixed frequency position, or locus, for the second formant, corresponding to the relatively fixed place of production of the consonant. On that basis, the transitions may be regarded as "movements" from the locus to the steady state of the vowel.The experiments reported in this paper provide additional evidence concerning the existence and positions of these second-formant loci for the voiced stops, b, d, and g. There appears to be a locus for d at 1800 cps and for b at 720 cps. A locus for g can be demonstrated only when the adjoining vowel has its second formant above about 1200 cps; below that level no g locus was found.The results of these experiments indicate that, for the voiced stops, the transition cannot begin at the locus and go from there to the steady-state level of the vowel. Rather, if we are to hear the appropriate consonant, the first part of the transition must be silent. The voiced stops are best synthesized by making the duration of the silent interval equal to the duration of the transition itself.An experiment on the first formant revealed that its locus is the same for b, d, and g.N an earlier experiment •.•' we undertook to find out whether the transitions (frequency shifts) of the second formant--often seen in spectrograms in the region where consonant and vowel join--can be cues for the identification of the voiced stop consonants. For that purpose we prepared a series of simplified, handpainted spectrograms of transition-plus-vowel, then converted these patterns into sound and played the recordings to naive listeners for judgment as b, d, or g. The agreement among the listeners was, in general, sufficient to show that transitions of the second formant can serve as cues for the identification of the stops and, also, to enable us to select, for each vowel, the particular transitions that best produced each of the stop consonant phones. These transitions are shown in Fig. 1.We found in further experiments 2 that these same second-formant transitions can serve as cues for the unvoiced stops (p-t-k) and the nasal consonants (m-n-•), provided, of course, that the synthetic patterns are otherwise changed to contain appropriate acoustic cues for the voiceless and nasal manners of production. Moreover, and more important for the purposes of this paper, the results of these experiments plainly indicated a relationship between second-formant transition and articulatory place of production. Thus, the same second-formant transitions that had been found to produce b proved to be appropriate also for the synthesis of p and m, which, like b, are articulated at the lips; the second-formant transitions that produced d produced the consonants t and n, which have in • Liberman, Delattre, Cooper, and Gerstman, Psychol. Monogr. 68, No. 8, 1-13 (1954). forma...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.