2018
DOI: 10.1186/s13636-018-0135-7
|View full text |Cite
|
Sign up to set email alerts
|

Enhancement of speech dynamics for voice activity detection using DNN

Abstract: Voice activity detection (VAD) is an important preprocessing step for various speech applications to identify speech and non-speech periods in input signals. In this paper, we propose a deep neural network (DNN)-based VAD method for detecting such periods in noisy signals using speech dynamics, which are time-varying speech signals that may be expressed as the first-and second-order derivatives of mel cepstra, also known as the delta and delta-delta features. Unlike these derivatives, in this paper, the dynami… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(2 citation statements)
references
References 31 publications
0
2
0
Order By: Relevance
“…In addition to SE, voice activity detection (VAD) plays an important role in speech-related applications [18]. Typically, VAD is carried out using the clean speech estimates from SE modules in noisy environments [19][20][21]. In this case, statistical or NN-based VAD requires hand-labeled annotations to train the VAD models.…”
Section: Introductionmentioning
confidence: 99%
“…In addition to SE, voice activity detection (VAD) plays an important role in speech-related applications [18]. Typically, VAD is carried out using the clean speech estimates from SE modules in noisy environments [19][20][21]. In this case, statistical or NN-based VAD requires hand-labeled annotations to train the VAD models.…”
Section: Introductionmentioning
confidence: 99%
“…The submitted SAD system is based on the multilayer perceptron (MLP), which has been shown to be simple and efficient [5,6,7]. We significantly improved the performance by tuning post-processing parameters, and experimented with untranscribed data by means of ASR transcriptions of the 19k hour Apollo-11 (A11) dataset.…”
Section: Introductionmentioning
confidence: 99%