Distant talking robust speech recognition using late reflection components of room impulse response

2010 10th IEEE-RAS International Conference on Humanoid Robots

2010

Self Cite

Abstract-In enclosed environments where robots are deployed, the observed speech signal is smeared due to reverberation. This degrades the performance of the automatic speech recognition (ASR). Thus, hands-free speech recognition for human-machine communication is a difficult task. Most speech enhancement techniques used to address this problem enhance the contaminated waveform independent from that of the ASR. However, this approach does not necessarily improve ASR performance. In this paper, we expand the conventional spectral subtraction-based (SS) technique to deal with reverberation. In our proposed approach, the dereverberation parameters of SS are optimized to improve the likelihood of the acoustic model and not just the waveform signal. The system is capable of adaptively fine-tuning these parameters jointly with acoustic model training for effective use in ASR application. We have experimented using real reverberant data collected from an operational robot. Moreover, we also evaluated with reverberant data corrupted with environmental and robot internal noise. Experimental results show that the proposed method significantly improves the recognition performance over conventional approach.

Section: Introductionmentioning

confidence: 99%

Robust hands-free Automatic Speech Recognition for human-machine interaction

Gómez

Kawahara

2010 10th IEEE-RAS International Conference on Humanoid Robots

2010

Self Cite

“…Nakatani et al proposed a high-performance method of blind dereverberation based on ShortTime Fourier Transformation (STFT) representation [1]. Gomez et al applied fast spectral subtraction for late reverberation by using a pre-recorded impulse response [2]. However, these and other familiar methods have not dealt with the echo-cancellation problem, or used a priori knowledge about the environment, such as room impulse response.…”

Section: Introductionmentioning

confidence: 99%

ICA-based efficient blind dereverberation and echo cancellation method for barge-in-able robot audition

Takeda

2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Takahashi

et al. 2009

This paper describes a new method that allows "Barge-In" in various environments for robot audition. "Barge-in" means that a user begins to speak simultaneously while a robot is speaking. To achieve the function, we must deal with problems on blind dereverberation and echo cancellation at the same time. We adopt Independent Component Analysis (ICA) because it essentially provides a natural framework for these two problems. To deal with reverberation, we apply a Multiple Input/Output INverse-filtering Theorem-based model of observation to the frequency domain ICA. The main problem is its high-computational cost of ICA. We reduce the computational complexity to the linear order of reverberation time by using two techniques: 1) a separation model based on observed signal independence, and 2) enforced spatial sphering for preprocessing. The experimental results revealed that our method improved word correctness of reverberant speech by 10-20 points.

“…We adopted multi-channel semi-blind independent component analysis (MCSB-ICA) [1], because: 1) it is theoretically robust against Gaussian noise, such as that from fans, 2) it can theoretically deal with separation of the known speech, user's speech, and other sound sources, including their reverberations. Other methods have not dealt with known-source signals [2], [3], [4], user's speech signals [5], or have not been able to deal with reverberation [6], [7]. The requirements for MCSB-ICA to achieve robot audition are: a) fast convergence speed for estimating the separation filter of source signals, and b) low computational cost.…”

Section: Introductionmentioning

confidence: 99%

Step-size parameter adaptation of multi-channel semi-blind ICA with piecewise linear model for barge-in-able robot audition

Takeda

2009 IEEE/RSJ International Conference on Intelligent Robots and Systems

Takahashi

et al. 2009

Abstract-This paper describes a step-size parameter adaptation technique of multi-channel semi-blind independent component analysis (MCSB-ICA) for a "barge-in-able" robot audition system. By "barge-in", we mean that the user can speak simultaneously when the robot is speaking. We focused on MCSB-ICA to achieve such an audition system because it can separate a user's and a robot's speech under reverberant environments. The problem with MCSB-ICA for robot audition is the slow speed of convergence in estimating a separation filter due to its step-size parameters. Many optimization methods cannot be adopted because their computational costs are proportional to the 2nd order of the reverberation time. Our method yields adaptive step-size parameters with MCSB-ICA at low computational costs. It is based on three techniques; 1) recursive expression of the separation process, 2) a piecewise linear model of the step-size of the separation filter, and 3) adaptive step-size parameters with a sub-ICA-filter. Experimental results show that our approach attains faster convergence speed and lower computational costs than those with a fixed step-size parameter.