2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2015
DOI: 10.1109/asru.2015.7404823
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive beamforming and adaptive training of DNN acoustic models for enhanced multichannel noisy speech recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2016
2016
2018
2018

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 24 publications
0
10
0
Order By: Relevance
“…The improvement brought by these techniques appears to be quite correlated between real and simulated data. Other authors also found this result to hold for auditory-motivated features such as Gabor filterbank (GBFB) (Martinez and Meyer, 2015) and amplitude modulation filter bank (AMFB) (Moritz et al, 2015) and feature transformation/augmentation methods such as vocal tract length normalization (VTLN) (Tachioka et al, 2015) or i-vectors (Pang and Zhu, 2015;Prudnikov et al, 2015), provided that these features and methods are applied to noisy data or data enhanced using the robust beamforming or source separation techniques listed in Section 3.2. Interestingly, Tachioka et al (2015) found VTLN to yield consistent results on real vs. simulated data when using GEV beamforming as a pre-processing step but opposite results when using MVDR beamforming instead.…”
Section: Robust Features and Feature Normalizationmentioning
confidence: 98%
See 1 more Smart Citation
“…The improvement brought by these techniques appears to be quite correlated between real and simulated data. Other authors also found this result to hold for auditory-motivated features such as Gabor filterbank (GBFB) (Martinez and Meyer, 2015) and amplitude modulation filter bank (AMFB) (Moritz et al, 2015) and feature transformation/augmentation methods such as vocal tract length normalization (VTLN) (Tachioka et al, 2015) or i-vectors (Pang and Zhu, 2015;Prudnikov et al, 2015), provided that these features and methods are applied to noisy data or data enhanced using the robust beamforming or source separation techniques listed in Section 3.2. Interestingly, Tachioka et al (2015) found VTLN to yield consistent results on real vs. simulated data when using GEV beamforming as a pre-processing step but opposite results when using MVDR beamforming instead.…”
Section: Robust Features and Feature Normalizationmentioning
confidence: 98%
“…This constraint is valid for the CHiME-3 simulated data, which are simulated using a pure delay filter, but it does not hold anymore on real data. Indeed, early reflections (and to a lesser extent reverberation) modify the apparent speaker direction at each frequency, which results Table 4: WER (%) achieved by beamforming and spatial post-filtering applied on all channels except ch2 using the GMM backend retrained on enhanced real and simulated data (Prudnikov et al, 2015 (Anguera et al, 2007) which was used in many challenge submissions do not suffer from this issue due to the fact that their spatial response decays slowly in the neighborhood of the estimated speaker direction. Modern adaptive beamformers such as MCA or the mask-based MVDR beamformer of Yoshioka et al (2015) do not suffer from this issue either, due to the fact that they estimate the relative (inter-microphone) transfer function instead of the direction-of-arrival.…”
Section: 1 Beamforming and Post-filteringmentioning
confidence: 99%
“…Heymann et al (2015) employ a DNN to perform the necessary speech and noise covariance estimates. Other teams have employed a conventional delay and sum beamformer (e.g., Sivasankaran et al, 2015;Hori et al, 2015;Prudnikov et al, 2015). Of these, several reported that the freely available BeamformIt tool developed by Anguera et al (2007) worked very effectively.…”
Section: Target Enhancementmentioning
confidence: 99%
“…However, the two most effective techniques are transforming the DNN features using feature-space maximum likelihood linear regression (fMLLR) (Hori et al, 2015;Moritz et al, 2015;Vu et al, 2015;Sivasankaran et al, 2015;Tran et al, unpublished) or augmentation of the DNN features using either i-vectors, (e.g., Moritz et al, 2015;Zhuang et al, 2015), pitch-based features (Ma et al, 2015;Wang et al, 2015;Du et al, 2015) or bottleneck features (Tachioka et al, 2015), i.e., extracted from bottleneck layers in speaker classification DNNs. Where i-vectors have been used they may be either per-speaker (e.g., Prudnikov et al, 2015) or per-speaker-environment, (e.g. Ma et al, 2015).…”
Section: Feature Designmentioning
confidence: 99%
“…MCA algorithm was successfully applied in our submission to the CHiME Challenge 2015 where it has demonstrated competitive results compared to several well-known beamforming algorithms [20]. The important characteristics of MCA include the implementation simplicity, low computational complexity and resistance to target direction errors.…”
Section: Multichannel Alignment (Mca)mentioning
confidence: 99%