2017 20th International Conference of Computer and Information Technology (ICCIT) 2017
DOI: 10.1109/iccitechn.2017.8281794
|View full text |Cite
|
Sign up to set email alerts
|

The combination of spectral entropy, zero crossing rate, short time energy and linear prediction error for voice activity detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(8 citation statements)
references
References 5 publications
0
6
0
Order By: Relevance
“…To enrich the experimental data and accurately reflect the impact of noise on the algorithms, we label the self-recorded speech files and relabel the CSTR database, which has a short duration. First, accurate voiced/unvoiced information of clean speech is obtained using a method for voice activity detection that combines short-time energy and spectral entropy [45]. Then, we use SWIPE' and BaNa, which have the best performance in [6], to extract the pitch in the voiced segments.…”
Section: B Obtaining Ground-truth Pitch Values and Noisy Speech Databasementioning
confidence: 99%
“…To enrich the experimental data and accurately reflect the impact of noise on the algorithms, we label the self-recorded speech files and relabel the CSTR database, which has a short duration. First, accurate voiced/unvoiced information of clean speech is obtained using a method for voice activity detection that combines short-time energy and spectral entropy [45]. Then, we use SWIPE' and BaNa, which have the best performance in [6], to extract the pitch in the voiced segments.…”
Section: B Obtaining Ground-truth Pitch Values and Noisy Speech Databasementioning
confidence: 99%
“…Time-domain methods include energy−based endpoint detectors [8,9], zero-crossing rate-based methods [10], Autocorrelation Function (ACF) based methods [11] and different feature combination detection methods [12][13][14][15]. Energy-based noise detection methods use differences in energy to distinguish noise and speech.…”
Section: Related Workmentioning
confidence: 99%
“…This reflects, in outline, the frequency characteristics of the signal. It is generally thought that a speech segment will have a short-time zero crossing rate that is lower than a certain threshold, while the noise will be higher than the threshold [13]. However, the zero-crossing rate of noise in laser detected speech can be low or high, because the causes of the noise differ.…”
Section: Related Workmentioning
confidence: 99%
“…In the face of one-dimensional dynamic sound signals, how to perform better recognition and classification also comes down to how to choose a better feature representation. The currently commonly used feature representations can be divided into features directly extracted from the original signal in the time domain (such as autocorrelation, zero-crossing rate) [ 19 , 20 ] and features obtained by converting the signal into frequency (such as spectral centroid [ 21 ], Mel-scale frequency cepstral coefficient). The most widely used feature extraction technique is the Mel-scale frequency cepstral coefficient (MFCC) [ 22 ], but it also shows some limitations when dealing with low signal-to-noise ratio (SNR) sound signals.…”
Section: Introductionmentioning
confidence: 99%