Interspeech 2013 2013
DOI: 10.21437/interspeech.2013-199
|View full text |Cite
|
Sign up to set email alerts
|

All for one: feature combination for highly channel-degraded speech activity detection

Abstract: Speech activity detection (SAD) on channel transmissions is a critical preprocessing task for speech, speaker and language recognition or for further human analysis. This paper presents a feature combination approach to improve SAD on highly channel degraded speech as part of the Defense Advanced Research Projects Agency's (DARPA) Robust Automatic Transcription of Speech (RATS) program. The key contribution is the feature combination exploration of different novel SAD features based on pitch and spectro-tempor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 24 publications
(4 citation statements)
references
References 11 publications
0
4
0
Order By: Relevance
“…The VAD needs to work accurately in challenging environments, including noisy conditions, reverberant environments and environments with competing speech. Significant research has been devoted to finding the optimal VAD features and models [1,2,3,4,5]. In the literature, LSTM based VAD is a popular architecture for sequential modeling of the VAD task, showing state-of-the-art performance [3,5].…”
Section: Introductionmentioning
confidence: 99%
“…The VAD needs to work accurately in challenging environments, including noisy conditions, reverberant environments and environments with competing speech. Significant research has been devoted to finding the optimal VAD features and models [1,2,3,4,5]. In the literature, LSTM based VAD is a popular architecture for sequential modeling of the VAD task, showing state-of-the-art performance [3,5].…”
Section: Introductionmentioning
confidence: 99%
“…Several prior works have focused on find-ing better discriminative features for supervised classification [14]- [20]. For instance in [17] the authors suggest a combination of MFCCs and Gabor features. In [21] the authors suggest the use of source and filter based features and perform a score level fusion.…”
Section: Related Work On Supervised Speech Activity Detectionmentioning
confidence: 99%
“…Traditionally, SAD is formulated as a statistical hypothesis test employing probabilistic models, such as Gaussians, mixtures of Gaussians, or Laplacian distributions [6,10,11,12]. During the last decade, however, deep neural networks (DNNs) have achieved impressive results on some of the more taxing SAD tasks, outperforming the traditional approaches [8,13,14].…”
Section: Introductionmentioning
confidence: 99%