The second &amp;#x2018;CHiME&amp;#x2019; speech separation and recognition challenge: An overview of challenge systems and outcomes

Vincent, Emmanuel; Barker, Jon; Watanabe, Shinji; Roux, Jonathan Le; Nesta, Francesco; Matassoni, Marco

doi:10.1109/asru.2013.6707723

Cited by 66 publications

(76 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The first CHiME Challenge held in 2011 was the first concerted evaluation of ASR systems in a real-world domestic environment involving both reverberation and highly dynamic background noise made up of multiple sound source [50]. The second CHiME Challenge in 2013 was supported by the IEEE AASP, MLSP and SL Technical Committees [51]. The configuration considered by this Challenge was that of speech from a single target speaker being binaurally recorded in a domestic environment involving multisource background noise.…”

Section: Smart Home and Aalmentioning

confidence: 99%

On Distant Speech Recognition for Home Automation

2015

View full text Add to dashboard Cite

Abstract. In the framework of Ambient Assisted Living, home automation may be a solution for helping elderly people living alone at home. This study is part of the Sweet-Home project which aims at developing a new home automation system based on voice command to improve support and well-being of people in loss of autonomy. The goal of the study is vocal order recognition with a focus on two aspects: distance speech recognition and sentence spotting. Several ASR techniques were evaluated on a realistic corpus acquired in a 4-room flat equipped with microphones set in the ceiling. This distant speech French corpus was recorded with 21 speakers who acted scenarios of activities of daily living. Techniques acting at the decoding stage, such as our novel approach called Driven Decoding Algorithm (DDA), gave better speech recognition results than the baseline and other approaches. This solution which uses the two best SNR channels and a priori knowledge (voice commands and distress sentences) has demonstrated an increase in recognition rate without introducing false alarms. Generally speaking, a short overview allows then to outline the research challenges that speech technologies must take up for Ambient Assisted Living and Augmentative and Alternative Communication, and the current reseach avenues in this domain.

show abstract

Section: Smart Home and Aalmentioning

confidence: 99%

On Distant Speech Recognition for Home Automation

2015

View full text Add to dashboard Cite

show abstract

“…In comparing our results against those obtained by the actual participants of the CHiME Challenge [20], ours are among the top two. Note that the CHiME challenge participants employed strategies at the spatial signal, feature and model levels [21] Table 2. WER under different SNRs.…”

Section: Speech Recognitionmentioning

confidence: 99%

Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition

Feng¹,

Zhang²,

Glass³

2014

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

197

View full text Add to dashboard Cite

Denoising autoencoders (DAs) have shown success in generating robust features for images, but there has been limited work in applying DAs for speech. In this paper we present a deep denoising autoencoder (DDA) framework that can produce robust speech features for noisy reverberant speech recognition. The DDA is first pre-trained as restricted Boltzmann machines (RBMs) in an unsupervised fashion. Then it is unrolled to autoencoders, and fine-tuned by corresponding clean speech features to learn a nonlinear mapping from noisy to clean features. Acoustic models are re-trained using the reconstructed features from the DDA, and speech recognition is performed. The proposed approach is evaluated on the CHiME-WSJ0 corpus, and shows a 16-25% absolute improvement on the recognition accuracy under various SNRs.Index Terms-robust speech recognition, feature denoising, denoising autoencoder, deep neural network

show abstract

“…In the previous two-channel CHiME challenges Vincent et al, 2013a) target enhancement has been achieved using mixed strategies exploiting both spatial and spectral diversity. However, the CHiME-3 scenario, with 5-forward facing microphones, a relatively fixed speaker location and wide, open environments lends itself strongly to multichannel beamforming approaches.…”

Section: Target Enhancementmentioning

confidence: 99%

“…The system is based on the Kaldi DNN-system recipe for Track 2 of the 2nd CHiME challenge Vincent et al, 2013a). Feature vectors are constructed from concatenating 7 frames of 13 dimensional Mel-frequency cepstral coefficients (MFCCs) then compressing to 40 dimensions using LDA with one of 2500 tied tri-phone HMM states as the class.…”

Section: Speech Recognitionmentioning

confidence: 99%

The third ‘CHiME’ speech separation and recognition challenge: Analysis and outcomes

Barker

Marxer

Vincent

et al. 2017

Computer Speech & Language

Self Cite

105

View full text Add to dashboard Cite

This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognition evaluation designed to target the increasingly relevant multichannel, mobile-device speech recognition scenario. The paper serves two purposes. First, it provides a definitive reference for the challenge, including full descriptions of the task design, data capture and baseline systems along with a description and evaluation of the 26 systems that were submitted. The best systems re-engineered every stage of the baseline resulting in reductions in word error rate from 33.4% to as low as 5.8%. By comparing across systems, techniques that are essential for strong performance are identified. Second, the paper considers the problem of drawing conclusions from evaluations that use speech directly recorded in noisy environments. The degree of challenge presented by the resulting material is hard to control and hard to fully characterise. We attempt to dissect the various 'axes of difficulty' by correlating various estimated signal properties with typical system performance on a per session and per utterance basis. We find strong evidence of a dependence on signal-to-noise ratio and channel quality. Systems are less sensitive to variations in the degree of speaker motion. The paper concludes by discussing the outcomes of CHiME-3 in relation to the design of future mobile speech recognition evaluations.

show abstract

The second ‘CHiME’ speech separation and recognition challenge: An overview of challenge systems and outcomes

Cited by 66 publications

References 6 publications

On Distant Speech Recognition for Home Automation

On Distant Speech Recognition for Home Automation

Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition

The third ‘CHiME’ speech separation and recognition challenge: Analysis and outcomes

Contact Info

Product

Resources

About