2007
DOI: 10.1016/j.specom.2007.05.003
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting correlogram structure for robust speech recognition with multiple speech sources

Abstract: International audienc

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
25
0

Year Published

2012
2012
2017
2017

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 41 publications
(25 citation statements)
references
References 26 publications
0
25
0
Order By: Relevance
“…Both quantities are computed using (28) and (29) given the current estimate of the noise model M n .…”
Section: Noise Model Estimationmentioning
confidence: 99%
See 1 more Smart Citation
“…Both quantities are computed using (28) and (29) given the current estimate of the noise model M n .…”
Section: Noise Model Estimationmentioning
confidence: 99%
“…Unlike the above mentioned marginalisation method, the SFD technique carries out both mask estimation and speech recognition at the same time by searching for the optimal segregation mask and HMM state sequence given a set of time-frequency fragments identified prior to the decoding stage. These fragments correspond to patches in the noisy spectrum that are dominated by the energy of an acoustic source [28]. Thus, the SFD approach determines the most likely set of speech fragments among all the possible combinations of source fragments by exploiting knowledge of the speech source provided by the speech models in the recogniser.…”
Section: Comparison With Other Missing-data Techniquesmentioning
confidence: 99%
“…A fragment decoding system then attempts to interpret the high-energy regions that are not accounted for by the noise floor model. The first step is to separately generate soft missing data masks (using the adaptive noise tracker) and fragments (using harmonicity-based techniques [36]) from the noisy signals.…”
Section: Combining Sfd and Noise Floor Modelingmentioning
confidence: 99%
“…This work employs techniques for tracking multiple pitches of simultaneous sounds in the autocorrelogram domain and use this information to identify fragments [36]. In brief, a running short-time autocorrelation is computed on the output of each gammatone filter using a 30-ms Hann window.…”
Section: B Fragment Generationmentioning
confidence: 99%
“…Excitation features, such as voicing and fundamental frequency, are used in many speech processing applications and include, for example, speech coding, enhancement, noise estimation, automatic speech recognition in noisy conditions and tonal language speech recognition (Kaewtip et al, 2013;Kawahara et al, 2001;Lei et al, 2006;Ma et al, 2007;McAulay and Champion, 1990;Morales-Cordovilla et al, 2011a,b). Similarly, spectral envelope and formant features are used in a range of applications such as speech coding, synthesis, recognition and voice conversion (Hermansky, 1990;Kawahara et al, 2001Kawahara et al, , 2009Koriyama et al, 2014).…”
Section: Introductionmentioning
confidence: 99%