6th European Conference on Speech Communication and Technology 1999
DOI: 10.21437/eurospeech.1999-154
|View full text |Cite
|
Sign up to set email alerts
|

The full combination sub-bands approach to noise robust HMM/ANN based ASR

Abstract: The performance of most ASR systems degrades rapidly with data mismatch relative to the data used in training. Under many realistic noise conditions a significant proportion of the spectral representation of a speech signal, which is highly redundant, remains uncorrupted. In the "missing feature" approach to this problem mismatching data is simply ignored, but the need to base recognition on unorthogonalised spectral features results in reduced performance in clean speech. In multiband ASR the results from ind… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2000
2000
2012
2012

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 25 publications
(5 citation statements)
references
References 11 publications
0
5
0
Order By: Relevance
“…where the validation confusion probabilities matrix estimated using only frames detected as speech, and is estimated from all other frames. We used here a hard speech/non-speech decision based on the frame energy level and estimated noise energy alone (one could use a more sophisticated speech/non-speech detector) as follows, (9) if else (10) where in each frame is estimated as ( is the unused PLP coefficient), and is estimated as the average value over the first frames in each utterance (see Fig. 4).…”
Section: Modelmentioning
confidence: 99%
See 2 more Smart Citations
“…where the validation confusion probabilities matrix estimated using only frames detected as speech, and is estimated from all other frames. We used here a hard speech/non-speech decision based on the frame energy level and estimated noise energy alone (one could use a more sophisticated speech/non-speech detector) as follows, (9) if else (10) where in each frame is estimated as ( is the unused PLP coefficient), and is estimated as the average value over the first frames in each utterance (see Fig. 4).…”
Section: Modelmentioning
confidence: 99%
“…Speech features used were cepstral domain PLP [6], excluding the energy coefficient "c0". Recognition was performed using the "all combinations multi-stream hybrid" model [9,10]. In this system a separate one hidden layer MLP is trained for each of the seven non-empty combinations of the three cepstral feature streams (PLP, delta PLP, delta delta PLP) (where delta PLP coefficients were over 9 windows, and delta delta over 9 windows).…”
Section: Recognition Testsmentioning
confidence: 99%
See 1 more Smart Citation
“…Provided that the combined features input to each ANN expert are merely concatenated (i.e. no compression, orthogonalisation, or whatever is applied), the expected posteriors for each position of missing features can be computed directly from the IDCN parameters, and then simply combined in a linearly weighted sum [11] or geometrically weighted product [5].…”
Section: Position Of Missing Data Unknownmentioning
confidence: 99%
“…If the position of missing data is not known, one successful approach [6,11,12] has been to train a separate classifier for each possible position of missing data and then to combine the posteriors for one class as a weighted sum over all classifiers. Even with equal weights this approach shows some robustness to missing data, because "uncertain" classifiers tend to contribute equal and therefore small probabilities to each class.…”
Section: Introductionmentioning
confidence: 99%