Detection and Separation of Speech Event Using Audio and Video Information Fusion and Its Application to Robust Speech Interface

Asano, Futoshi; Yamamoto, Kiyoshi; Hara, Isao; Ogata, Jun; Yoshimura, Takashi; Motomura, Yoichi; Ichimura, Naoyuki; Asoh, Hideki

doi:10.1155/s1110865704402303

Cited by 26 publications

(13 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Signal processing technique uses MUSIC spectrum method and fusion of video by Bayesian network, and it reduces environment noise by using ML beamforming (details are described in [17,18]). …”

Section: Noise Robust Speech Recognitionmentioning

confidence: 99%

An extensible dialogue script for robot based on unification of state transition models

Matsusaka

Fujii

Hara

2009

2009 IEEE International Symposium on Computational Intelligence in Robotics and Automation - (CIRA)

Self Cite

View full text Add to dashboard Cite

We propose extension-by-unification method to improve reusability of the dialogue components in the development of communication function of the robot. Compared to previous extension-by-connection method used in behavior-based communication robot developments, the extension-by-unification method has the ability to decompose the script into components. The decomposed components can be recomposed to build a new application easily. In this paper, first we, explain a reformulation we have applied to the conventional state-transition model. Second, we explain a set of algorithms to decompose, recompose, and detect the conflict of each component. Third, we explain a dialogue engine and a script management server we have developed. The script management server has a function to propose reusable components to the developer in real time by implementing the conflict detection algorithm. The dialogue engine SEAT (Speech Event-Action Translator) has flexible adapter mechanism to enable quick integration to robotic systems. We have confirmed that by the application of three robots, development efficiency has improved by 30%.

show abstract

Section: Noise Robust Speech Recognitionmentioning

confidence: 99%

An extensible dialogue script for robot based on unification of state transition models

Matsusaka

Fujii

Hara

2009

2009 IEEE International Symposium on Computational Intelligence in Robotics and Automation - (CIRA)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Bayesian networks are a way of modeling a joint probability distribution of multiple random variables. In [10], a Bayesian network was used to detect the time and position of speech events by analyzing audio and video data. The gained information was then utilized to robustly recognize and separate speech signals in noisy and reverberant environments.…”

Section: Introductionmentioning

confidence: 99%

“…The introduced tracking algorithm is solely based on color distributions to identify and track moving objects in a video sequence. It is a robust technique more flexible than the background subtraction method [10] and well-suited for abrupt changes in the camera position as well as for alterations in the environment [14].…”

Section: Introductionmentioning

confidence: 99%

Dynamical information fusion of heterogeneous sensors for 3D tracking using particle swarm optimization

2011

View full text Add to dashboard Cite

“…Most existing methods for speaker detection are realized by combining techniques of sound localization via a microphone array and human tracking via background subtraction by using coupled Hidden Markov Models (HMMs) or Dynamic Bayesian Networks (DBNs) [11,2]. However, because of the spatial resolution of the microphone array, these methods can become ineffective in situations where speakers are physically close to each other.…”

Section: Introductionmentioning

confidence: 99%

Speaker detection using the timing structure of lip motion and sound

Horii

Kawashima

Matsuyama

2008

2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

View full text Add to dashboard Cite

Detection and Separation of Speech Event Using Audio and Video Information Fusion and Its Application to Robust Speech Interface

Cited by 26 publications

References 14 publications

An extensible dialogue script for robot based on unification of state transition models

An extensible dialogue script for robot based on unification of state transition models

Dynamical information fusion of heterogeneous sensors for 3D tracking using particle swarm optimization

Speaker detection using the timing structure of lip motion and sound

Contact Info

Product

Resources

About