International audience—Nonlinear audio system identification generally relies on Gaussianity, whiteness and stationarity hypothesis on the input signal, although audio signals are non-Gaussian, highly correlated and non-stationary. However, since the physical behavior of nonlinear audio systems is input-dependent, they should be identified using natural audio signals (speech or music) as input, instead of artificial signals (sweeps or noise) as usually done. We propose an identification scheme that conditions audio signals to fit the desired properties for an efficient identification. The identification system consists in (1) a Gaussianization step that makes the signal near-Gaussian under a perceptual constraint; (2) a predictor filterbank that whitens the signal; (3) an orthonor-malization step that enhances the statistical properties of the input vector of the last step, under a Gaussianity hypothesis; (4) an adaptive nonlinear model. The proposed scheme enhances the convergence rate of the identification and reduces the steady state identification error, compared to other schemes, for example the classical adaptive nonlinear identification
The long-term harmonic plus noise model (LT-HNM) for speech shows an interesting data compression, since it exploits the smooth evolution of the time trajectories of the short-term harmonic plus noise model parameters, by applying a discrete cosine model (DCM). In this paper, we extend the LT-HNM to a complete low bit-rate speech coder. A Normalized Split Vector Quantization (NSVQ) is proposed to quantize the variable dimension LT-DCM vectors. The NSVQ is designed according to the properties of the DCM vectors obtained from a standard speech database. The obtained LT-HNM coder reaches an average bit-rate of 2.7kbps for wideband speech. The proposed coder is evaluated in terms of modeling and coding errors, bit-rate, listening quality and intelligibility.
Audio watermarking is usually used as a multimedia copyright protection tool or as a system that embed metadata in audio signals. In this paper, watermarking is viewed as a preprocessing step for further audio processing systems: the watermark signal conveys no information, rather it is used to modify the statistical characteristics of an audio signal, in particular its nonstationarity. The embedded watermark is then added in order to stationnarize the host signal. Indeed, the embedded watermark is piecewise stationary, thus it modifies the stationarity of the original audio signal. In some audio processing fields, this fact can be used to improve performances that are very sensitive to time-variant signal statistics. This paper presents an analysis of the perceptual watermarking impact on the stationarity of audio signals. The study is based on stationarity indices, which represent a measure of variations in spectral characteristics of signals, using time-frequency representations. Simulation results with two kinds of signals, artificial signals and audio signals (speech and music), are presented. Stationarity indices comparison between watermarked and original audio signals shows a significant stationarity enhancement of the watermarked signal, especially for transient attacks.Index Terms-Perceptual audio watermarking, stationarity indices, time-frequency representations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.