2012
DOI: 10.1186/1687-4722-2012-22
|View full text |Cite
|
Sign up to set email alerts
|

Biomimetic multi-resolution analysis for robust speaker recognition

Abstract: Humans exhibit a remarkable ability to reliably classify sound sources in the environment even in presence of high levels of noise. In contrast, most engineering systems suffer a drastic drop in performance when speech signals are corrupted with channel or background distortions. Our brains are equipped with elaborate machinery for speech analysis and feature extraction, which hold great lessons for improving the performance of automatic speech processing systems under adverse conditions. The work presented he… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
5
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 22 publications
1
5
0
Order By: Relevance
“…Increased slowness in the modulation domain is realized in the time-frequency domain as an overall broadening and reorientation of the STRFs, and reflects an enhancement of the modulations known to characterize speech and other natural sounds [11], [25], [26], [28]. The fact that we obtain improved SAD results using “slower” filters is consistent with other strategies that concentrate the feature extraction pipeline to the range of speech-specific modulations [27], [94]–[96]. Moreover, the STRF adaptation patterns observed here are broadly compatible with traditional signal processing schemes that emphasize slow modulations for improving noise robustness in speech tasks [97].…”
Section: Discussionsupporting
confidence: 81%
See 1 more Smart Citation
“…Increased slowness in the modulation domain is realized in the time-frequency domain as an overall broadening and reorientation of the STRFs, and reflects an enhancement of the modulations known to characterize speech and other natural sounds [11], [25], [26], [28]. The fact that we obtain improved SAD results using “slower” filters is consistent with other strategies that concentrate the feature extraction pipeline to the range of speech-specific modulations [27], [94]–[96]. Moreover, the STRF adaptation patterns observed here are broadly compatible with traditional signal processing schemes that emphasize slow modulations for improving noise robustness in speech tasks [97].…”
Section: Discussionsupporting
confidence: 81%
“…We contend that because nature has converged to a robust solution for handling unseen and noisy acoustics, there is much to leverage from auditory neurophysiology when designing automated sound processing systems. Generally speaking, cortically inspired feature representations based on spectro-temporal receptive fields underlie a number of successful approaches to noise robust speech activity detection [27], speech and speaker recognition [94]–[96], [99], and auditory scene classification [100]. The present study, in concert with other recent work in our lab [31], [101], [102], represents an extension of this methodology by incorporating the cognitive effects of dynamic, task-driven sensory adaptation as part of the feature extraction pipeline.…”
Section: Discussionmentioning
confidence: 99%
“…The choice of model parameters builds on the current knowledge of psychophysical principles of speech perception in noise [8] complemented with a statistical analysis of the dependencies between spectral details of the message and speaker information. More details are available in [6].…”
Section: Frequency Domain Linear Predictors (Fdlp) [5]mentioning
confidence: 99%
“…1 shows the schematic of our system which is a good example of this principle. In particular, the overall architecture uses five complementary features [3,4,5,6] that are transformed into speaker supervectors and i-vectors and used in three different classifiers [11,12,13]. The final recognition score is obtained by fusing the scores produced by each classifier via logistic regression [9].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation