Constant-Q signal analysis and synthesis

Youngberg, J.; Boll, S.

doi:10.1109/icassp.1978.1170547

Cited by 60 publications

(24 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This is not aligned well with human perception, which is known to have a constant Q factor between 500Hz and 20kHz [13]. Perceptually motivated, the constant Q transform (CQT) was introduced in [17] and later refined in [18]. Applying the CQT allows better time-frequency resolution as described in [13].…”

Section: Proposed Featuresmentioning

confidence: 99%

Non-intrusive Speech Quality Assessment Using Neural Networks

Avila

Gamper

Reddy

et al. 2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Estimating the perceived quality of an audio signal is critical for many multimedia and audio processing systems. Providers strive to offer optimal and reliable services in order to increase the user quality of experience (QoE). In this work, we present an investigation of the applicability of neural networks for non-intrusive audio quality assessment. We propose three neural network-based approaches for mean opinion score (MOS) estimation. We compare our results to three instrumental measures: the perceptual evaluation of speech quality (PESQ), the ITU-T Recommendation P.563, and the speech-to-reverberation energy ratio. Our evaluation uses a speech dataset contaminated with convolutive and additive noise, labeled using a crowd-based QoE evaluation, evaluated with Pearson correlation with MOS labels, and mean-squared-error of the estimated MOS. Our proposed approaches outperform the aforementioned instrumental measures, with a fully connected deep neural network using Mel-frequency features providing the best correlation (0.87) and the lowest mean squared error (0.15).Index Terms-Audio quality assessment, speech quality assessment, deep neural network * Work on this project performed as an intern at Microsoft Research Labs, Redmond, WA.

show abstract

Section: Proposed Featuresmentioning

confidence: 99%

Non-intrusive Speech Quality Assessment Using Neural Networks

Avila

Gamper

Reddy

et al. 2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…The CQT was proposed [51,52]. Here, Q is defined as the ratio of center frequency to bandwidth, which is as Eq.…”

Section: Constant-q Transformmentioning

confidence: 99%

Discriminative features based on modified log magnitude spectrum for playback speech detection

Yang

Ren³

et al. 2020

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

In order to improve the performance of hand-crafted features to detect playback speech, two discriminative features, constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients, are proposed for playback speech detection in this work. They rely on our findings that variance-based modified log magnitude spectrum and mean-based modified log magnitude spectrum can enhance the discriminative power between genuine speech and playback speech. Then constant-Q variance-based octave coefficients (constant-Q mean-based octave coefficients) can be obtained by combining variance-based modified log magnitude spectrum (mean-based modified log magnitude spectrum), octave segmentation, and discrete cosine transform. Finally, constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients are evaluated on ASVspoof 2017 corpus version 2.0 and ASVspoof 2019 physical access, respectively. Experimental results show that variance-based modified log magnitude spectrum and mean-based modified log magnitude spectrum can produce discriminative features toward playback speech. Further results on the two databases show that constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients can perform better than some common features, such as mel frequency cepstral coefficients and constant-Q cepstral coefficients.

show abstract

“…In contrast, the constant-Q transform (CQT), originally introduced in [22] and in music processing by J. Brown [2], provides a frequency resolution that depends on geometrically spaced center frequencies of the analysis windows.…”

Section: Introductionmentioning

confidence: 99%

A Framework for Invertible, Real-Time Constant-Q Transforms

Holighaus

Dörfler

Velasco

et al. 2013

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Audio signal processing frequently requires time-frequency representations and in many applications, a non-linear spacing of frequency-bands is preferable. This paper introduces a framework for efficient implementation of invertible signal transforms allowing for non-uniform and in particular non-linear frequency resolution. Non-uniformity in frequency is realized by applying nonstationary Gabor frames with adaptivity in the frequency domain. The realization of a perfectly invertible constant-Q transform is described in detail. To achieve real-time processing, independent of signal length, slicewise processing of the full input signal is proposed and referred to as sliCQ transform.By applying frame theory and FFT-based processing, the presented approach overcomes computational inefficiency and lack of invertibility of classical constant-Q transform implementations. Numerical simulations evaluate the efficiency of the proposed algorithm and the method's applicability is illustrated by experiments on real-life audio signals.

show abstract

Constant-Q signal analysis and synthesis

Cited by 60 publications

References 7 publications

Non-intrusive Speech Quality Assessment Using Neural Networks

Non-intrusive Speech Quality Assessment Using Neural Networks

Discriminative features based on modified log magnitude spectrum for playback speech detection

A Framework for Invertible, Real-Time Constant-Q Transforms

Contact Info

Product

Resources

About