This chapter sununarizes the results we obtained over the last 15 years at Indiana University on the perception of synthetic speech produced by rule. A wide variety of behavioral studies have been carried out on phoneme intelligibil ity, word recognition, and comprehension to learn more about how human listeners perceive and understand synthetic speech. Some of this research, particularly the earlier studies on segmental intelligibility, was directed toward applied issues deal ing with perceptual evaluation and assessment of different synthesis systems. Other aspects of the research program have been more theoretically motivated and were de signed to learn more about speech perception and spoken language comprehension. Our findings have shown that the perception of synthetic speech depends on several general factors including the acoustic-phonetic properties of the speech signal, the specific cognitive demands of the information-processing task the listener is asked to perform, and the previous background and experience of the listener. Suggestions for future research on improving naturalness, intelligibility, and comprehension are offered in light of several recent findings on the role of stimulus variability and the contribution of indexical factors to speech perception and spoken word recognition. Our perceptual findings have shown the importance of behavioral testing with hu man listeners as an integral component of evaluation and assessment techniques in synthesis research and development.
IntroductionMy interest in the perception of synthetic speech dates back to 1979 when the MITalk text-to-speech system was nearing completion (AHK87]. At that time, a number of people, including Dennis Klatt, Sheri Hunnicutt, Rolf Carlson, and Bjorn Granstrom, were working in the Speech Group at MIT on various aspects of this system. Given my earlier research on speech perception, it seemed quite appro priate to carry out a series of perceptual studies with human listeners to assess how . good the MlTalk synthetic speech actually was and what differences, if any, would be found in perception between natural speech and synthetic speech produced by role. Since that time, my research group at Indiana has conducted a large number of experiments to learn more about the perception and comprehension ofsynthetic speech produced by rule. This chapter provides a summary and interpretation of the major findings obtained over the last 15 years and some suggestions for future research directions.