We present an analysis of the rate of sign changes in the discrete Fourier spectrum of a sequence. The sign changes of either the real or imaginary parts of the spectrum are considered, and the rate of sign changes is termed as the spectral zerocrossing rate (SZCR). We show that SZCR carries information pertaining to the locations of transients within the temporal observation window. We show duality with temporal zero-crossing rate analysis by expressing the spectrum of a signal as a sum of sinusoids with random phases. This extension leads to spectral-domain iterative filtering approaches to stabilize the spectral zero-crossing rate and to improve upon the location estimates. The localization properties are compared with group-delay-based localization metrics in a stylized signal setting well-known in speech processing literature. We show applications to epoch estimation in voiced speech signals using the SZCR on the integrated linear prediction residue. The performance of the SZCR-based epoch localization technique is competitive with the state-of-the-art epoch estimation techniques that are based on average pitch period.
We establish zero-crossing rate (ZCR) relations between the input and the subbands of a maximally decimated -channel power complementary analysis filterbank when the input is a stationary Gaussian process. The ZCR at lag is defined as the number of sign changes between the samples of a sequence and its -sample shifted version, normalized by the sequence length. We derive the relationship between the ZCR of the Gaussian process at lags that are integer multiples of and the subband ZCRs. Based on this result, we propose a robust iterative autocorrelation estimator for a signal consisting of a sum of sinusoids of fixed amplitudes and uniformly distributed random phases. Simulation results show that the performance of the proposed estimator is better than the sample autocorrelation over the SNR range of to 15 dB. Validation on a segment of a trumpet signal showed similar performance gains.
We analyze the spectral zero-crossing rate (SZCR) properties of transient signals and show that SZCR contains accurate localization information about the transient. For a train of pulses containing transient events, the SZCR computed on a sliding window basis is useful in locating the impulse locations accurately. We present the properties of SZCR on standard stylized signal models and then show how it may be used to estimate the epochs in speech signals. We also present comparisons with some state-of-the-art techniques that are based on the group-delay function. Experiments on real speech show that the proposed SZCR technique is better than other group-delaybased epoch detectors. In the presence of noise, a comparison with the zero-frequency filtering technique (ZFF) and Dynamic programming projected Phase-Slope Algorithm (DYPSA) showed that performance of the SZCR technique is better than DYPSA and inferior to that of ZFF. For highpass-filtered speech, where ZFF performance suffers drastically, the identification rates of SZCR are better than those of DYPSA.
Frequency-domain linear prediction (FDLP) is widely used in speech coding for modeling envelopes of transients signals, such as voiced and unvoiced stops, plosives, etc. FDLP fits an auto regressive model to the discrete cosine transform (DCT) coefficients of a sequence. The spectral prediction coefficients provide a parametric model of the temporal envelope. The prediction coefficients are obtained by solving the set of Yule-Walker equations expressing the relationship between lagged spectral autocorrelation values. A limitation of the direct approach of computing the spectral autocorrelation values is that the sequence has to be padded with a large number of zeros for the autocorrelation estimates to be reasonably accurate. This comes at the cost of increased computational complexity. We present an efficient and accurate method for computing the spectral autocorrelation samples. We show that the spectral autocorrelation can be computed as cosine-weighted temporal centroids, where the weighting function is dependent on time-index of the samples.
We present a novel approach to represent transients using spectral-domain amplitude-modulated/frequency-modulated (AM-FM) functions. The model is applied to the real and imaginary parts of the Fourier transform (FT) of the transient. The suitability of the model lies in the observation that since transients are well-localized in time, the real and imaginary parts of the Fourier spectrum have a modulation structure. The spectral AM is the envelope and the spectral FM is the group delay function. The group delay is estimated using spectral zero-crossings and the spectral envelope is estimated using a coherent demodulator. We show that the proposed technique is robust to additive noise. We present applications of the proposed technique to castanets and stop-consonants in speech.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.