2006 6th IEEE-RAS International Conference on Humanoid Robots 2006
DOI: 10.1109/ichr.2006.321359
|View full text |Cite
|
Sign up to set email alerts
|

Speech Recognition for a Humanoid with Motor Noise Utilizing Missing Feature Theory

Abstract: Automatic speech recognition (ASR) is essential for a human-humanoid communication. One of the main problems with ASR is that a humanoid inevitably generates motor noises. These noises are easily captured by the humanoid's microphones because the noise sources are closer to the microphones than the target speech source. Thus, the signal-to-noise ratio (SNR) of input speech becomes quite low (sometimes less than 0 dB). However, it is possible to estimate these noises by using information about the humanoid's ow… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
20
0

Year Published

2008
2008
2021
2021

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 19 publications
(20 citation statements)
references
References 20 publications
0
20
0
Order By: Relevance
“…Their model, however, was unable to deal with ego-motion noise. Nishimura et al [21] estimated the ego noise of distinct gestures and motions of the robot. Using motion commands, the pre-recorded correct noise template matching a recent motion could be selected from the template database and the acoustic features of the aligned template could be used for MFT weight calculation.…”
Section: Related Workmentioning
confidence: 99%
“…Their model, however, was unable to deal with ego-motion noise. Nishimura et al [21] estimated the ego noise of distinct gestures and motions of the robot. Using motion commands, the pre-recorded correct noise template matching a recent motion could be selected from the template database and the acoustic features of the aligned template could be used for MFT weight calculation.…”
Section: Related Workmentioning
confidence: 99%
“…We introduced a stream weight optimization module which is mentioned in Section II. For AVSR implementation, MFT-based Julius [18] was used.…”
Section: The Second Layer Av Integration Blockmentioning
confidence: 99%
“…For a simultaneous speech recognition task of several speakers, Yamamoto et al [7] and Takahashi et al [10] proposed a model for mask generation based on the disturbing effect of leakage noise over speech, because an imperfect source separation causes distorting elements, however their model is unable to deal with ego-motion noise. Nishimura et al [11] estimated ego noises of distinct gestures and motions of the robot. Using motion commands, the pre-recorded correct noise template matching to the recent motion was selected from the template database and the acoustic features of the aligned template are used for MFT weight calculation.…”
Section: A Comparison To Related Workmentioning
confidence: 99%