Extraction of Fundamental Frequency From Degraded Speech Using Temporal Envelopes at High SNR Frequencies

Aneeja, G.; Yegnanarayana, B.

doi:10.1109/taslp.2017.2666425

Cited by 23 publications

(24 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…SFF has high resolution in terms of frequency leading to sharp harmonics utilized for the extraction of fundamental frequency. 32,33 The discrete-time speech signal denoted by s(n) is differenced, and the differenced signal is denoted by x(n) = s(n) − s(n − 1). The sampling frequency is Fs.…”

Section: Proposed Methodsmentioning

confidence: 99%

“…The output of SFF at each stage of frequency has large SNR areas employed to develop the speech and nonspeech region detection. SFF has high resolution in terms of frequency leading to sharp harmonics utilized for the extraction of fundamental frequency 32,33 . The discrete‐time speech signal denoted by s ( n ) is differenced, and the differenced signal is denoted by x ( n ) = s ( n ) − s ( n − 1).…”

Section: Proposed Methodsmentioning

confidence: 99%

See 1 more Smart Citation

A novel double pole transfer function‐single frequency filtering approach for speech enhancement

Srinivasarao

Ghanekar

2020

Trans Emerging Tel Tech

View full text Add to dashboard Cite

Speech intelligibility improvement is a major field of recent research with in audiology and hearing domain. Speech signal usually gets affected due to various environmental and background noise. Several approaches for speech intelligibility improvement were presented in the literature. In this paper, a method of speech enhancement based on Double Pole Transfer Function-Single Frequency Filter (DPTF-SFF) has been proposed. In the SFF approach the temporal and spectral resolutions are controlled with a one particular value of the filter that corresponds to the pole position in the complex z-plane. The proposed method uses double pole transfer function SFF that improves the magnitude and phase for various frequencies which are used in the synthesis of the original signal. In the analysis of this method clean speech signals are used from GRID corpus database and noise signals are taken from NOIZEUS and NOISEX-92 databases. The performance evaluation of this method is carried out in terms of Perceptual Evaluation of Speech Quality, Short-Time Objective Intelligibility, and Segmental Signal to Noise Ratio metrics. The implementation results proved that the proposed method is an efficient one as it gives higher intelligibility scores compared to the existing methods at different SNR levels (dB) and for different noises employed.

show abstract

Section: Proposed Methodsmentioning

confidence: 99%

Section: Proposed Methodsmentioning

confidence: 99%

A novel double pole transfer function‐single frequency filtering approach for speech enhancement

Srinivasarao

Ghanekar

2020

Trans Emerging Tel Tech

View full text Add to dashboard Cite

show abstract

“…It is also shown that the optimal choice of F depends on the frame length and the harmonic order [13]. However, for simplicity and fast implementation, in this paper, we set F = 2 14 . The state space for the discrete variables can be expressed as…”

Section: The State Evolution Modelmentioning

confidence: 99%

“…Using (9), (12), (13), (14), (19) and (20), a closed-form marginal likelihood can be obtained, i.e., p(y n |ẍ n , Y n−1 )…”

Section: Pitch Trackingmentioning

confidence: 99%

Robust Bayesian Pitch Tracking Based on the Harmonic Model

Shi

Nielsen

Jensen

et al. 2019

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Fundamental frequency is one of the most important characteristics of speech and audio signals. Harmonic modelbased fundamental frequency estimators offer a higher estimation accuracy and robustness against noise than the widely used autocorrelation-based methods. However, the traditional harmonic model-based estimators do not take the temporal smoothness of the fundamental frequency, the model order, and the voicing into account as they process each data segment independently. In this paper, a fully Bayesian fundamental frequency tracking algorithm based on the harmonic model and a first-order Markov process model is proposed. Smoothness priors are imposed on the fundamental frequencies, model orders, and voicing using first-order Markov process models. Using these Markov models, fundamental frequency estimation and voicing detection errors can be reduced. Using the harmonic model, the proposed fundamental frequency tracker has an improved robustness to noise. An analytical form of the likelihood function, which can be computed efficiently, is derived. Compared to the state-of-the-art neural network and non-parametric approaches, the proposed fundamental frequency tracking algorithm has superior performance in almost all investigated scenarios, especially in noisy conditions. For example, under 0 dB white Gaussian noise, the proposed algorithm reduces the mean absolute errors and gross errors by 15% and 20% on the Keele pitch database and 36% and 26% on sustained /a/ sounds from a database of Parkinson's disease voices. A MATLAB version of the proposed algorithm is made freely available for reproduction of the results 1 .

show abstract

“…The presence of high SNR regions in the SFF outputs was exploited for speech and nonspeech detection, after suitably compensating for the noise in the degraded speech signal [13]. The SFF method was also used for extracting GCIs [14], locating burst onsets [15] and fundamental frequency extraction [16,17]. The significance of the phase of SFF output of speech is also examined recently in [18].…”

Section: Introductionmentioning

confidence: 99%

Detection of Glottal Closure Instants in Degraded Speech Using Single Frequency Filtering Analysis

2018

View full text Add to dashboard Cite

Impulse-like characteristics of excitation occur at the glottal closure instant (GCI) due to sharp closure of the vibrating vocal folds in each glottal cycle. The GCIs are detected from the excitation component of the speech signal, and the excitation component is derived using inverse filtering or its variants. In this paper we propose a method for GCI detection based on single frequency filtering (SFF) of the speech signal. The SFF output has high signal-to-noise ratio (SNR) property in speech regions. The variance (across frequency) contour computed from the SFF output show rapid changes around the GCIs, and these rapid changes can be observed even when the speech signal is degraded. Thus the GCI locations can be extracted even from degraded speech using the SFF analysis. The robustness of the method is demonstrated for several cases of degradation of speech signal.

show abstract

Extraction of Fundamental Frequency From Degraded Speech Using Temporal Envelopes at High SNR Frequencies

Cited by 23 publications

References 24 publications

A novel double pole transfer function‐single frequency filtering approach for speech enhancement

A novel double pole transfer function‐single frequency filtering approach for speech enhancement

Robust Bayesian Pitch Tracking Based on the Harmonic Model

Detection of Glottal Closure Instants in Degraded Speech Using Single Frequency Filtering Analysis

Contact Info

Product

Resources

About