This paper describes a new algorithm for speech recognition by using stereo vision pattern recognition equations with competition and cooperation. In our research, we applied recently developed 3-layered neural net (3LNN) equations to speech recognition. Our proposed acoustic models using these equations yield better recognition results than the hidden Markov model (HMM). When using a 216 (240) word database, stereo vision acoustic models gave 6.5% (6.6%) higher accuracy than HMMs.
This paper presents an experimental study on an agent system with multimodal interfaces for a smart office environment. The agent system is based upon multimodal interfaces such as recognition modules for both speech and pen-mouse gesture, and identification modules for both face and fingerprint. For essential modules, speech recognition and synthesis were basically used for a virtual interaction between user and system. In this study, a real-time speech recognizer based on a Hidden Markov Network (HM-Net) was incorporated into the proposed system. In addition, identification techniques based on both face and fingerprint were adopted to provide a specific user with the service of a user-customized interaction with security in an office environment. In evaluation, results showed that the proposed system was easy to use and would prove useful in a smart office environment, even though the performance of the speech recognizer was not satisfactory mainly due to noisy environments.
This paper aims for developing the intelligent robot emulating human synesthetic skills which associate a color image with sound, so that we are able to build an application system based on the principle of mutual conversion between color image and sound. As the first step, in this study, we have tried to realize a basic system using the color image to sound conversion. This study describes a new conversion method to convert color image into sound, based on the likelihood in the physical frequency information between light and sound. In addition, we present the method of converting color image into sound using color model conversion as well as histograms in the converted color model. In the basis of the method proposed in this study, we built a basic system using Microsoft Visual C++(ver. 6.0). The simulation results revealed that the hue, saturation and intensity elements of a input color image were converted into F0, harmonic and octave elements of a sound, respectively. The converted sound elements were synthesized to generate a sound source with WAV file format using Csound toolkit.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.