2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2015
DOI: 10.1109/asru.2015.7404825
|View full text |Cite
|
Sign up to set email alerts
|

Unified ASR system using LGM-based source separation, noise-robust feature extraction, and word hypothesis selection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
13
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 13 publications
(14 citation statements)
references
References 17 publications
1
13
0
Order By: Relevance
“…Again, these techniques operate by estimating the relative transfer function for the target speaker and the interfering sources from data. As expected, Bagchi et al (2015) and Fujita et al (2015) reported similar performance for these two techniques on real and simulated data. Single-channel enhancement based on nonnegative matrix factorization (NMF) of the power spectra of speech and noise has also been used and resulted in minor improvement on both real and simulated data (Bagchi et al, 2015;Vu et al, 2015).…”
Section: Source Separationsupporting
confidence: 62%
See 2 more Smart Citations
“…Again, these techniques operate by estimating the relative transfer function for the target speaker and the interfering sources from data. As expected, Bagchi et al (2015) and Fujita et al (2015) reported similar performance for these two techniques on real and simulated data. Single-channel enhancement based on nonnegative matrix factorization (NMF) of the power spectra of speech and noise has also been used and resulted in minor improvement on both real and simulated data (Bagchi et al, 2015;Vu et al, 2015).…”
Section: Source Separationsupporting
confidence: 62%
“…This claim also holds true for system combination based on recognizer output voting error reduction (ROVER) (Fiscus, 1997), as reported by Fujita et al (2015). This comes as no surprise as these techniques are somewhat orthogonal to acoustic modeling and they are either trained on separate material or do not rely on training at all.…”
Section: Language Modeling and Rover Fusionmentioning
confidence: 61%
See 1 more Smart Citation
“…The simplest approach has been to apply utterance-based feature mean and variance normalization (Zhao et al, 2015;Fujita et al, 2015;Du et al, 2015;Wang et al, 2015). However, the two most effective techniques are transforming the DNN features using feature-space maximum likelihood linear regression (fMLLR) (Hori et al, 2015;Moritz et al, 2015;Vu et al, 2015;Sivasankaran et al, 2015;Tran et al, unpublished) or augmentation of the DNN features using either i-vectors, (e.g., Moritz et al, 2015;Zhuang et al, 2015), pitch-based features (Ma et al, 2015;Wang et al, 2015;Du et al, 2015) or bottleneck features (Tachioka et al, 2015), i.e., extracted from bottleneck layers in speaker classification DNNs.…”
Section: Feature Designmentioning
confidence: 99%
“…In Fujita et al (2015) and Zhao et al (2015), DNN filterbank features have been supplemented by delta and delta-delta features. Zhao et al (2015) show that this provides a significant improvement over using the 11 frames of filterbank features alone.…”
Section: Feature Designmentioning
confidence: 99%