2012
DOI: 10.1109/tasl.2011.2165945
|View full text |Cite
|
Sign up to set email alerts
|

Combining Speech Fragment Decoding and Adaptive Noise Floor Modeling

Abstract: Abstract-This paper presents a novel noise-robust automatic speech recognition (ASR) system that combines aspects of the noise modeling and source separation approaches to the problem. The combined approach has been motivated by the observation that the noise backgrounds encountered in everyday listening situations can be roughly characterized as a slowly varying noise floor in which there are embedded a mixture of energetic but unpredictable acoustic events. Our solution combines two complementary techniques.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2013
2013
2016
2016

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 40 publications
0
5
0
Order By: Relevance
“…The system then proceeds by searching for the best combination of fragments that represent the target source. This can be further enhanced by utilising knowledge about the noise present in the signal [144]. An alternative approach in [108] uses an uncertainty decoder to allow for errors made in the mask estimation process.…”
Section: Auditory Modellingmentioning
confidence: 99%
“…The system then proceeds by searching for the best combination of fragments that represent the target source. This can be further enhanced by utilising knowledge about the noise present in the signal [144]. An alternative approach in [108] uses an uncertainty decoder to allow for errors made in the mask estimation process.…”
Section: Auditory Modellingmentioning
confidence: 99%
“…A vector is said to belong to the cluster that is most likely to have generated it. As the distribution of the vector is assumed to be Gaussian, the cluster membership m− → x (t) of a vector − → x (t) is defined as (15) and then the unreliable components of the vector are reconstructed using MAP estimation method.…”
Section: Map Estimation For Unreliable Componentsmentioning
confidence: 99%
“…In MDT, two different methods have been considered to perform speech or speaker recognition with incomplete data: marginalization [13][14][15] and reconstruction [16,17]. In marginalization, the unreliable components are discarded or integrated up to the observed values.…”
Section: Introductionmentioning
confidence: 99%
“…Imputation [48,49,50,51,52,53,54,55] is dened as the technique of substituting missing time-frequency components with an estimate of the time-frequency component value based on speech signal's high degree of redundancy. In marginalization [56,57,58,59], missing spectrotemporal regions are ignored and thus, recognition is based on the reliable components of the noisy speech signal's time-frequency representation, where observation likelihoods are computed by integrating over the range of possible values of the missing components. All these methods exploit various speech signals properties to estimate the missing features, from the data correlation expressed through statistical models to sparsity-based estimation where the features are sparsely represented in a given dictionary.…”
Section: Missing Data Methodsmentioning
confidence: 99%
“…MDT have been extensively applied in the context of robust automatic speech recognition (ASR) as a solution to performance degradation due to noisy speech features, and they are distinguished in two main categories, namely, marginalization and imputation. In marginalization [56,58,59], speech decoding is based on the reliable components of a noisy time-frequency representation, while the unreliable components are eliminated or marginalized up to the observed values. The imputation approach [49,50,51,52,53,54,55] is associated with the estimation of the missing data, so that decoding can be performed in a conventional manner.…”
Section: Introductionmentioning
confidence: 99%