2017
DOI: 10.1016/j.csl.2016.10.005
|View full text |Cite
|
Sign up to set email alerts
|

The third ‘CHiME’ speech separation and recognition challenge: Analysis and outcomes

Abstract: This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognition evaluation designed to target the increasingly relevant multichannel, mobile-device speech recognition scenario. The paper serves two purposes. First, it provides a definitive reference for the challenge, including full descriptions of the task design, data capture and baseline systems along with a description and evaluation of the 26 systems that were submitted. The best systems re-engineered every stage of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
57
0
5

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 103 publications
(62 citation statements)
references
References 34 publications
0
57
0
5
Order By: Relevance
“…The performance on real and simulated data appears to be similar on the development set but quite different on the test set. This difference is mostly due to the fact that the test speakers produced less intelligible speech when recorded in noisy environments than when recorded in a booth (Barker et al, 2016). By contrast, the development speakers produced similarly intelligible speech in both situations.…”
Section: Baselinementioning
confidence: 92%
See 4 more Smart Citations
“…The performance on real and simulated data appears to be similar on the development set but quite different on the test set. This difference is mostly due to the fact that the test speakers produced less intelligible speech when recorded in noisy environments than when recorded in a booth (Barker et al, 2016). By contrast, the development speakers produced similarly intelligible speech in both situations.…”
Section: Baselinementioning
confidence: 92%
“…The real data consists of utterances spoken live by 12 US English talkers in these environments and recorded by a tablet equipped with an array of six sample-synchronized microphones: two microphones numbered 1 and 3 facing forward on the top left and right, one microphone numbered 2 facing backward on the top center, and three microphones numbered 4, 5, and 6 facing forward on the bottom left, center, and right. See Barker et al (2016, Fig. 1) for a diagram.…”
Section: Characterization Of the Mismatchesmentioning
confidence: 99%
See 3 more Smart Citations