Voiced/Unvoiced Decision for Speech Signals Based on Zero-Crossing Rate and Energy

Bachu, Rajesh; Kopparthi, S.; Adapa, B.; Barkana, Buket D.

doi:10.1007/978-90-481-3660-5_47

Cited by 120 publications

(72 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…if frame t is speech (8) where t p is the previous noise frame and β is the forgetting factor of value 0 < β < 1.…”

Section: Likelihood Ratio Measurementioning

confidence: 99%

“…Many researchers have studied different methods to develop an efficient VAD and most of them are heuristics using different speech parameters, such as, energy [5], [6], [7], zero crossing rate [2], [8], cepstral [9], LPC [10], etc. However, the algorithms based on speech features with heuristic rules have difficulty in coping with real world noises at low SNR conditions.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Mel-Scaled Autoregressive (Mel-AR) Model based Voice Activity Detection using Likelihood Ratio Measure

Babul¹

2019

IJCA

View full text Add to dashboard Cite

In this paper, a Mel-scaled AR (Mel-AR) model based VAD is presented, where likelihood ratio measure is used to classify the input speech frames as speech/non-speech segments. The Mel-AR model parameters have been estimated on the linear frequency scale from the input speech signal without applying bilinear transformation. This has been done by employing a first-order all-pass filter rather than unit delay. The performance of the proposed VAD is evaluated on Aurora-2 database by measuring FAR and FRR. The equal false rate (EFR) at the crossover point is also presented as a merit of VAD. In addition, the performance of the proposed VAD in speech recognition is verified by incorporating it with a Mel-Wiener filter for MLPC based noisy speech recognition.

show abstract

“…if frame t is speech (8) where t p is the previous noise frame and β is the forgetting factor of value 0 < β < 1.…”

Section: Likelihood Ratio Measurementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Mel-Scaled Autoregressive (Mel-AR) Model based Voice Activity Detection using Likelihood Ratio Measure

Babul¹

2019

IJCA

View full text Add to dashboard Cite

show abstract

“…Indicative examples of time-domain estimators include the zero-crossing-rate (ZCR) [3][4], the measurement of energy level [3] [4], the peak-to-valley difference (PVD) [2] and the autocorrelation (ACORR) [5]. The measurement of the energy level and the ACORR methods again rely on an adaptive threshold for background noises, and tend to fail when the magnitude of the noises approaches or exceeds that of the voiced sounds, even when separated in time.…”

Section: Motivation and Related Workmentioning

confidence: 99%

Listening for people: Exploiting the spectral structure of speech to robustly perceive the presence of people

Hilsenbeck

Kirchner

2011

2011 IEEE/RSJ International Conference on Intelligent Robots and Systems

View full text Add to dashboard Cite

As the desire to see robots ubiquitous in society grows, so does the need for providing the robots with the means of building awareness of any humans with which it may be sharing the environment. This paper presents a real-world suitable system which enables robots to robustly perceive the presence of people acoustically. The proposed binaural system first identifies voiced signal by means of a novel approach to Voice Activity Detection that exploits the spectral signature and characteristics of speech without reliance on a priori knowledge. Bearing estimates for each speaker are then made using a multitrack particle filter with a belief update function comprised of a Cross-correlation bearing estimate and an estimate of the speaker's fundamental frequency. Results, from an evaluation of each of the major system components and a system evaluation in which the robot successfully built human-centric situational awareness of the three humans with which it shared an office lunch-room containing typical background noises, are presented and discussed.

show abstract

“…If the number of zero crossings is more in a given signal, then the signal is changing rapidly and accordingly the signal may contain high frequency information which is termed as unvoiced speech. On the other hand, if the number of zero crossing is less, then the signal is changing slowly and accordingly the signal may contain low frequency information which is termed as voiced speech [17]. That's why the Zero Crossing Rate can gives information about the frequency content of the signal, which can be considered as a good indicator about the speaker itself.…”

Section: B Short Time Zero Crossing Rate (Stzcr)mentioning

confidence: 99%

An Improved Approach for Text-Independent Speaker Recognition

Chakroun¹,

Zouari²,

Frikha³

2016

ijacsa

View full text Add to dashboard Cite

Voiced/Unvoiced Decision for Speech Signals Based on Zero-Crossing Rate and Energy

Cited by 120 publications

References 6 publications

Mel-Scaled Autoregressive (Mel-AR) Model based Voice Activity Detection using Likelihood Ratio Measure

Mel-Scaled Autoregressive (Mel-AR) Model based Voice Activity Detection using Likelihood Ratio Measure

Listening for people: Exploiting the spectral structure of speech to robustly perceive the presence of people

An Improved Approach for Text-Independent Speaker Recognition

Contact Info

Product

Resources

About