The Large Time-Frequency Analysis Toolbox 2.0

Průša, Zdeněk; Søndergaard, Peter; Holighaus, Nicki; Wiesmeyr, Christoph; Balázs, Péter

doi:10.1007/978-3-319-12976-1_25

Cited by 83 publications

(57 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Both experiments were run on the first 5 seconds of all 70 test signals from the Sound Quality Assessment Material recordings for subjective tests provided by the European Broadcasting Union (SQAM database) [68]. For wavelet analysis and synthesis, we used the filter bank methods in the open source Large Time-Frequency Analysis Toolbox (LTFAT [69], http://ltfat.github.io/), where our implementation of Wavelet Phase Gradient Heap Integration (WPGHI) is available by using the 'wavelet' flag in filterbankconstphase. A function to generate the wavelet filters and scripts for generating the individual experiments and figures are provided on the manuscript website http://ltfat.github.io/notes/053/, where the resulting audio files for all experiment conditions can be found as well.…”

Section: Methodsmentioning

confidence: 99%

Characterization of Analytic Wavelet Transforms and a New Phaseless Reconstruction Algorithm

Holighaus

Koliander

Průša

et al. 2019

IEEE Trans. Signal Process.

Self Cite

View full text Add to dashboard Cite

We obtain a characterization of all wavelets leading to analytic wavelet transforms (WT). The characterization is obtained as a by-product of the theoretical foundations of a new method for wavelet phase reconstruction from magnitude-only coefficients. The cornerstone of our analysis is an expression of the partial derivatives of the continuous WT, which results in phase-magnitude relationships similar to the short-time Fourier transform (STFT) setting and valid for the generalized family of Cauchy wavelets. We show that the existence of such relations is equivalent to analyticity of the WT up to a multiplicative weight and a scaling of the mother wavelet. The implementation of the new phaseless reconstruction method is considered in detail and compared to previous methods. It is shown that the proposed method provides significant performance gains and a great flexibility regarding accuracy versus complexity. Additionally, we discuss the relation between scalogram reassignment operators and the wavelet transform phase gradient and present an observation on the phase around zeros of the WT.

show abstract

Section: Methodsmentioning

confidence: 99%

Characterization of Analytic Wavelet Transforms and a New Phaseless Reconstruction Algorithm

Holighaus

Koliander

Průša

et al. 2019

IEEE Trans. Signal Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The output ( , ) is a matrix whose columns represent the frequencies of the signal at a fixed time. We used the DGT implementation provided in http://ltfat.github.io/doc/gabor/sgram.html [39].…”

Section: Audio Image Representationmentioning

confidence: 99%

Data augmentation approaches for improving animal audio classification

Nanni

Maguolo

Paci

2020

Ecological Informatics

127

View full text Add to dashboard Cite

In this paper we present ensembles of classifiers for automated animal audio classification, exploiting different data augmentation techniques for training Convolutional Neural Networks (CNNs). The specific animal audio classification problems are i) birds and ii) cat sounds, whose datasets are freely available. We train five different CNNs on the original datasets and on their versions augmented by four augmentation protocols, working on the raw audio signals or their representations as spectrograms. We compared our best approaches with the state of the art, showing that we obtain the best recognition rate on the same datasets, without ad hoc parameter optimization.Our study shows that different CNNs can be trained for the purpose of animal audio classification and that their fusion works better than the stand-alone classifiers. To the best of our knowledge this is the largest study on data augmentation for CNNs in animal audio classification audio datasets using the same set of classifiers and parameters. Our MATLAB code is available at https://github.com/LorisNanni .

show abstract

“…For (15), we empirically set σ Ref = 1 (all-ones) and µ = 10 −7 . Part of the routines in the LTFAT toolbox [1,24] were used to implement the NSGT. For each vowel type, the frame masks for an input speaker were computed from 3 × 3 pairs of signals 2 .…”

Section: Experimental Settingsmentioning

confidence: 99%

Harmonic-Aligned Frame Mask Based on Non-Stationary Gabor Transform with Application to Content-Dependent Speaker Comparison

Huang

Balázs

2019

Interspeech 2019

Self Cite

View full text Add to dashboard Cite

We propose harmonic-aligned frame mask for speech signals using non-stationary Gabor transform (NSGT). A frame mask operates on the transfer coefficients of a signal and consequently converts the signal into a counterpart signal. It depicts the difference between the two signals. In preceding studies, frame masks based on regular Gabor transform were applied to single-note instrumental sound analysis. This study extends the frame mask approach to speech signals. For voiced speech, the fundamental frequency is usually changing consecutively over time. We employ NSGT with pitch-dependent and therefore time-varying frequency resolution to attain harmonic alignment in the transform domain and hence yield harmonic-aligned frame masks for speech signals. We propose to apply the harmonic-aligned frame mask to content-dependent speaker comparison. Frame masks, computed from voiced signals of a same vowel but from different speakers, were utilized as similarity measures to compare and distinguish the speaker identities (SID). Results obtained with deep neural networks demonstrate that the proposed frame mask is valid in representing speaker characteristics and shows a potential for SID applications in limited data scenarios. Index Terms: Non-stationary Gabor transform, frame mask, harmonic alignment, pitchdependent frequency resolution, speaker feature, speaker comparison 1 Introduction Time-frequency (TF) analysis is the foundation of audio and speech signal processing. The shorttime Fourier transform (STFT) is a widely used tool, which can be effectively implemented by FFT [1]. STFT features straightforward interpretation of a signal. It provides uniform time and frequency resolution with linearly-spaced TF bins. The corresponding theory was generalized in the framework of Gabor analysis and Gabor frames [2, 3, 4].Signal synthesis is an important application area of time-frequency transforms. Signal modification, denoising, separation and so on can be achieved by manipulating the analysis coefficients to synthesize a desired one. The theory of Gabor multiplier [5] or, in general terms, frame multiplier [6,7] provides a basis for the stability and invertibility of such operations. A frame multiplier is an

show abstract

The Large Time-Frequency Analysis Toolbox 2.0

Cited by 83 publications

References 31 publications

Characterization of Analytic Wavelet Transforms and a New Phaseless Reconstruction Algorithm

Characterization of Analytic Wavelet Transforms and a New Phaseless Reconstruction Algorithm

Data augmentation approaches for improving animal audio classification

Harmonic-Aligned Frame Mask Based on Non-Stationary Gabor Transform with Application to Content-Dependent Speaker Comparison

Contact Info

Product

Resources

About