Synthetic methods applied to isolated syllables have permitted a systematic exploration of the acoustic cues to the perception of some of the consonant sounds. Methods, results, and working hypotheses are discussed.HE program of research on which we are engaged was described in general terms at the preceding Speech Communication Conference. • As we pointed out there, and in more detail in another paper? our work on the perception of speech was based on the assumption that we would have a flexible and convenient experimental method if we could use a spectrographic display to control or manipulate speech sounds. Workers at the Bell Telephone Laboratories had developed the sound spectrograph, which made it instrumentally feasible to obtain spectrograms of relatively long samples. of connected speech, and it had become evident that the spectrographic transform has important advantages over the oscillogram as a way of displaying speech sounds to the eye. We were interested in using the spectrogram, not merely as a representation of speech sounds, but also as a basis for modifying and, in the extreme case, creating them. For that purpose we built a machine called a pattern playback, which converts spectrographic pictures into sound, using either photographic copies of actual spectrograms or, alternatively, "synthetic" patterns which are painted by hand on a cellulose acetate base. Having determined first that the playback would speak quite intelligibly from photographic copies of actual spectrograms, we proceeded to prepare hand-painted patterns of test sentences s which were, by comparison with the original spectrograms, very highly simplified (see Fig. 1). In drawing the hand-painted spectrograms we tried, as the first step, to reproduce as well as we could those aspects of the original pattern which were most apparent to the eye, and then, by working back and forth between hand-painted spectrogram and sound, we modified the patterns, usually by trial and error, until the simplified spectrograms were rather highly intelligible.The work with simplified spectrograms did not provide unequivocal answers to questions about the * This research was made possible in part by funds granted bythe Carnegie Corporation of New York and in part through the support of the Department of Defense in connection with Contract DA49-170-sc-274. minimal and invariant patterns for the various sounds of speech, but it did enable us to develop our techniques, and, further, it suggested certain specific problems which appeared to warrant more systematic investigation. In our research on these problems we have departed from the procedure of progressively simplifying the spectrograms of actual speech and have undertaken instead to study the effects on perception of variations in isolated acoustic elements or patterns. Thus, we can hope to determine the separate contributions to the perception of speech of several acoustic variables and, ultimately, to learn how they can be combined to best effect. STOP CONSONANTS: BURSTS OF NOISEA careful inspect...
Previous studies with synthetic speech have shown that second-formant transitions are cues for the perception of the stop and nasal consonants. The results of those experiments can be simplified if it is assumed that each consonant has a characteristic and fixed frequency position, or locus, for the second formant, corresponding to the relatively fixed place of production of the consonant. On that basis, the transitions may be regarded as "movements" from the locus to the steady state of the vowel.The experiments reported in this paper provide additional evidence concerning the existence and positions of these second-formant loci for the voiced stops, b, d, and g. There appears to be a locus for d at 1800 cps and for b at 720 cps. A locus for g can be demonstrated only when the adjoining vowel has its second formant above about 1200 cps; below that level no g locus was found.The results of these experiments indicate that, for the voiced stops, the transition cannot begin at the locus and go from there to the steady-state level of the vowel. Rather, if we are to hear the appropriate consonant, the first part of the transition must be silent. The voiced stops are best synthesized by making the duration of the silent interval equal to the duration of the transition itself.An experiment on the first formant revealed that its locus is the same for b, d, and g.N an earlier experiment •.•' we undertook to find out whether the transitions (frequency shifts) of the second formant--often seen in spectrograms in the region where consonant and vowel join--can be cues for the identification of the voiced stop consonants. For that purpose we prepared a series of simplified, handpainted spectrograms of transition-plus-vowel, then converted these patterns into sound and played the recordings to naive listeners for judgment as b, d, or g. The agreement among the listeners was, in general, sufficient to show that transitions of the second formant can serve as cues for the identification of the stops and, also, to enable us to select, for each vowel, the particular transitions that best produced each of the stop consonant phones. These transitions are shown in Fig. 1.We found in further experiments 2 that these same second-formant transitions can serve as cues for the unvoiced stops (p-t-k) and the nasal consonants (m-n-•), provided, of course, that the synthetic patterns are otherwise changed to contain appropriate acoustic cues for the voiceless and nasal manners of production. Moreover, and more important for the purposes of this paper, the results of these experiments plainly indicated a relationship between second-formant transition and articulatory place of production. Thus, the same second-formant transitions that had been found to produce b proved to be appropriate also for the synthesis of p and m, which, like b, are articulated at the lips; the second-formant transitions that produced d produced the consonants t and n, which have in • Liberman, Delattre, Cooper, and Gerstman, Psychol. Monogr. 68, No. 8, 1-13 (1954). forma...
La comparaison de la duree des syllabes en anglais, allemand, espagnol et francais est presentee en 18 tableaux et une figure. Ces tableaux montrent que 1'effet de l'accent et de la place de la syllabe dans le groupe de sens varie considerablement d'une langue a l'autre, tandis que Teffet du type de syllabe (ouverte/fermee) a un rapport semblable dans toutes leslangues. Parmi les trois langues aintensite et a place d'accent variables, les differences de duree entre syllabes accentuees et syllabes inaccentuees sont le plus etendues en anglais, le moins en espagnol, et allemand est intermediaire. On peut en dire autant des differences de duree entre syllabes finales et non-finales. II existe une correlation entre les variations d'intensite vocalique et les variations de duree syllabique en anglais, en allemand et en espagnol, mais pas en francais, oula voyelle d'une syllabe finale (accentuee) est en moyenne legerement moins intense que celle d'une syllabe non-finale (inaccentuee). Dans les trois langues oü la place de Taccent varie, les syllabes inaccentuees sont en moyenne aussi longues -mais moins fortes -que les syllabes accentuees non-finales.Achtzehn Tabellen und eine Abbildung zeigen die Ergebnisse eines Vergleiches von englischen, deutschen, spanischen und französischen Silbenlängen. Die Tabellen machen klar, daß in diesen Sprachen der Einfluß von Betonung und Position innerhalb einer Sinneinheit sehr unterschiedlich ist, während der Einfluß der Silbenart (offen oder geschlossen) dieselben Unterschiede in allen aufweist In den drei Sprachen, deren Betonung und Intensität Variierungen unterworfen sind, zeigt das Englische immer den größten Längenunterschied zwischen betonten und unbetonten Silben und das Spanische den geringsten, während das Deutsche sich in der Mitte befindet. Dasselbe kann über den Unterschied zwischen nicht-finalen und finalen Silben gesagt werden. Vokalintensität-Variierungen stehen im Englischen, Deutschen und Spanischen in direkter Beziehung zu Silbenlängen-Variierungen, was im Französischen nicht der Fall ist, da der Vokal einer betonten Endsilbe oft weniger Intensität aufweist als der Vokal einer unbetonten nicht-finalen Silbe. In den drei Sprachen deutsch, englisch und spanisch sind die unbetonten Silben im Durchschnitt ebenso lang -aber nicht so laut -wie nicht-finale betonte Silben.Relative length is one of the elements which determine the perceptual "weight" of a syllable and give it prominence.As a firstStep towards comparing the distribution of syllable weight in English, German, Spanish, and French, we are investigating the length of The research reported herein was performed pursuant to a contract with the United States Office of Education, Department of Health, Education, and Weifare.Brought to you by | University of Arizona Authenticated Download Date | 6/7/15 9:45 PM
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.