Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
The deployment of systems for human-tomachine communication by voice requires overcoming a variety of obstacles that affect the speech-processing technologies. Problems encountered in the field might include variation in speaking style, acoustic noise, ambiguity of language, or confusion on the part of the speaker. The diversity of these practical problems encountered in the "real world" leads to the perceived gap between laboratory and "real-world" performance. To answer the question "What applications can speech technology support today?" the concept of the "degree of difficulty" of an application is introduced. The degree of difficulty depends not only on the demands placed on the speech recognition and speech synthesis technologies but also on the expectations of the user of the system. Experience has shown that deployment of effective speech communication systems requires an iterative process. This paper discusses general deployment principles, which are illustrated by several examples of human-machine communication systems.Speech-processing technology is now at the point at which people can engage in voice dialogues with machines, at least in limited ways. Simple voice communication with machines is now deployed in personal computers, in the automation of long-distance calls, and in voice dialing of mobile telephones. These systems have small vocabularies and strictly circumscribed task domains. In research laboratories there are advanced human-machine dialogue systems with vocabularies of thousands of words and intelligence to carry on a conversation on specific topics. Despite these successes, it is clear that the truly intelligent systems envisioned in science fiction are still far in the future, given the state of the art today.Human-machine dialogue systems can be represented as a four-step process, as shown in Fig. 1. This figure encompasses both the simple systems deployed today and the spoken language understanding we envision for the future. First, a speech recognizer transcribes sentences spoken by a person into written text (1, 2). Second, a language understanding module extracts the meaning from the text (3, 4). Third, a computer (consisting of a processor and a database) performs some action based on the meaning of what was said. Fourth, the person receives feedback from the computer in the form of a voice created by a speech synthesizer (5, 6). The boundaries between these stages of a dialogue system may not be distinct in practice. For instance, language-understanding modules may have to cope with errors in the text from the speech recognizer, and the speech recognizer may make use of grammar and semantic constraints from the language module in order to reduce recognition errors.In the 1993 "Colloquium on Human-Machine Communication by Voice," sponsored by the National Academy of Sciences (NAS), much of the discussion focused on practical difficulties in building and deploying systems for carrying on voice dialogues between humans and machines. Deployment of systems for human-to-mach...
The deployment of systems for human-tomachine communication by voice requires overcoming a variety of obstacles that affect the speech-processing technologies. Problems encountered in the field might include variation in speaking style, acoustic noise, ambiguity of language, or confusion on the part of the speaker. The diversity of these practical problems encountered in the "real world" leads to the perceived gap between laboratory and "real-world" performance. To answer the question "What applications can speech technology support today?" the concept of the "degree of difficulty" of an application is introduced. The degree of difficulty depends not only on the demands placed on the speech recognition and speech synthesis technologies but also on the expectations of the user of the system. Experience has shown that deployment of effective speech communication systems requires an iterative process. This paper discusses general deployment principles, which are illustrated by several examples of human-machine communication systems.Speech-processing technology is now at the point at which people can engage in voice dialogues with machines, at least in limited ways. Simple voice communication with machines is now deployed in personal computers, in the automation of long-distance calls, and in voice dialing of mobile telephones. These systems have small vocabularies and strictly circumscribed task domains. In research laboratories there are advanced human-machine dialogue systems with vocabularies of thousands of words and intelligence to carry on a conversation on specific topics. Despite these successes, it is clear that the truly intelligent systems envisioned in science fiction are still far in the future, given the state of the art today.Human-machine dialogue systems can be represented as a four-step process, as shown in Fig. 1. This figure encompasses both the simple systems deployed today and the spoken language understanding we envision for the future. First, a speech recognizer transcribes sentences spoken by a person into written text (1, 2). Second, a language understanding module extracts the meaning from the text (3, 4). Third, a computer (consisting of a processor and a database) performs some action based on the meaning of what was said. Fourth, the person receives feedback from the computer in the form of a voice created by a speech synthesizer (5, 6). The boundaries between these stages of a dialogue system may not be distinct in practice. For instance, language-understanding modules may have to cope with errors in the text from the speech recognizer, and the speech recognizer may make use of grammar and semantic constraints from the language module in order to reduce recognition errors.In the 1993 "Colloquium on Human-Machine Communication by Voice," sponsored by the National Academy of Sciences (NAS), much of the discussion focused on practical difficulties in building and deploying systems for carrying on voice dialogues between humans and machines. Deployment of systems for human-to-mach...
This paper describes the state of the art in applications ofvoice-processing technologies. In the first part, technologies concerning the implementation of speech recognition and synthesis algorithms are described. Hardware technologies such as microprocessors and DSPs (digital signal processors) are discussed. Software development environment, which is a key technology in developing applications software, ranging from DSP software to support software also is described. In the second part, the state of the art of algorithms from the standpoint of applications is discussed.Several issues concerning evaluation of speech recognition/ synthesis algorithms are covered, as well as issues concerning the robustness of algorithms in adverse conditions.Recently, voice-processing technology has been greatly improved. There is a large gap between the present voiceprocessing technology and that of 10 years ago. The speech recognition and synthesis market, however, has lagged far behind technological progress. This paper describes the state of the art in voice-processing technology applications and points out several problems concerning market growth that need to be solved.Technologies related to applications can be divided into two categories. One is system technologies and the other is speech recognition and synthesis algorithms.Hardware and software technologies are the main topics for system development. Hardware technologies are very important because any speech algorithm is destined for implementation on hardware. Technology in this area is advancing quickly. Almost all speech recognition/synthesis algorithms can be used with a microprocessor and several DSPs. With the progress of device technology and parallel architecture, hardware technology will continue to improve and will be able to cope with the huge number of calculations demanded by improved algorithms of the future. Also, software technologies are an important factor, as algorithms and application procedures should be implemented by the use of software technology. In this paper, therefore, software technology will be treated as an application development tool. Along with the growth areas of application of voice-processing technology, various architectures and tools that support applications development have been devised. Also, when speech processing is the application target, it is important to keep in mind the characteristics peculiar to speech. Speech communication basically is of a nature that it should work in a real-time interactive mode. Computer systems that handle speech communications with users should have an ability to cope with these operations. Several issues concerning real-time interactive communication will be described.For algorithms there are two important issues concerning application. One is the evaluation of algorithms, and the other is the robustness of algorithms under adverse conditions. Evaluation of speech recognition and synthesis algorithms has been one of the main topics in the research area. However, to consider applicatio...
This paper predicts speech synthesis, speech recognition, and speaker recognition technology for the year 2001, and it describes the most important research problems to be solved in order to arrive at these ultimate synthesis and recognition systems. The problems for speech synthesis include natural and intelligible voice production, prosody control based on meaning, capability of controlling synthesized voice quality and choosing individual speaking style, multilingual and multidialectal synthesis, choice of application-oriented speaking styles, capability of adding emotion, and synthesis from concepts. The problems for speech recognition include robust recognition against speech variations, adaptation/ normalization to variations due to environmental conditions and speakers, automatic knowledge acquisition for acoustic and linguistic modeling, spontaneous speech recognition, naturalness and ease of human-machine interaction, and recognition of emotion. The problems for speaker recognition are similar to those for speech recognition. The research topics related to all these techniques include the use of articulatory and perceptual constraints and evaluation methods for measuring the quality of technology and systems.VISION OF THE FUTURE For the majority of humankind, speech production and understanding are quite natural and unconsciously acquired processes performed quickly and effectively throughout our daily lives. By the year 2001, speech synthesis and recognition systems are expected to play important roles in advanced user-friendly human-machine interfaces (1). Speech recognition systems include not only those that recognize messages but also those that recognize the identity of the speaker. Services using these systems will include database access and management, various order-made services, dictation and editing, electronic secretarial assistance, robots (e.g., the computer HAL in 2001-A Space Odyssey), automatic interpreting (translating) telephony, security control, and aids for the handicapped (e.g., reading aids for the blind and speaking aids for the vocally handicapped) (2). Today, many people in developed countries are employed to sit at computer terminals wearing telephone headsets and transfer information from callers to computer systems (databases) and vice versa (information and transaction services). According to the basic idea that boring and repetitive tasks done by human beings should be taken over by machines, these information-transfer workers should be replaced by speech recognition and synthesis machines. Dictation or voice typewriting is expected to increase the speed of input to computers and to allow many operations to be carried out without hand or eye movements that distract attention from the task on the display. Fig. 1 shows a typical structure for task-specific voice control and dialogue systems. Although the speech recognizer, which converts spoken input into text, and the language analyzer, which extracts meaning from text, are separated into two boxes in the figure, ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.