2012
DOI: 10.1250/ast.33.33
|View full text |Cite
|
Sign up to set email alerts
|

Voice activity detection in noise using modulation spectrum of speech: Investigation of speech frequency and modulation frequency ranges

Abstract: Voice activity detection (VAD) in noisy environments is a very important preprocessing scheme in speech communication technology, a field which includes speech recognition, speech coding, speech enhancement and captioning video contents. We have developed a VAD method for noisy environments based on the modulation spectrum. In Experiment 1, we investigate the optimal ranges of speech and modulation frequencies for the proposed algorithm by using the simulated data in the CENSREC-1-C corpus. Results show that w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2018
2018
2019
2019

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 24 publications
0
3
0
Order By: Relevance
“…Human vocal frequencies are constricted to a specific range, generally 80Hz to 4000Hz are considered vocal frequencies [18]. Thereby audio segments containing no vocal frequencies or very few vocal frequencies for a very short duration could be considered noise only segments.…”
Section: B Vocal Filteringmentioning
confidence: 99%
See 1 more Smart Citation
“…Human vocal frequencies are constricted to a specific range, generally 80Hz to 4000Hz are considered vocal frequencies [18]. Thereby audio segments containing no vocal frequencies or very few vocal frequencies for a very short duration could be considered noise only segments.…”
Section: B Vocal Filteringmentioning
confidence: 99%
“…Using voice activity detection (VAD) [18], duration of vocal frequencies occurring is identified. The ratio of the vocal to non-vocal frequencies duration (V-NV) is calculated per segment using the identified duration and total duration.…”
Section: B Vocal Filteringmentioning
confidence: 99%
“…Periodic to aperiodic component ratios were employed in [9]. Pek et al [10] used modulation indices of the modulation spectra of speech data. Kinnunen and Rajad [11] introduced likelihood ratio-based VAD method in which speech and non-speech models are trained on an utterance-by-utterance basis using mel-frequency cepstral coefficients (MFCCs).…”
Section: Introductionmentioning
confidence: 99%