Joint time-frequency scattering for audio classification

Andén, Joakim; Lostanlen, Vincent; Mallat, Stéphane

doi:10.1109/mlsp.2015.7324385

Cited by 84 publications

(141 citation statements)

References 13 publications

Supporting

Mentioning

140

Contrasting

Order By: Relevance

“…We use the SPGL1 solver [127] with at most 200 iterations, and 2 := 0.01. The second system is MAPsCAT, which uses features computed with the scattering transform [3]. This produces 40 feature vectors of 469 dimensions for a 30-s excerpt.…”

Section: Methodsmentioning

confidence: 99%

“…We use the Echo Nest Musical Fingerprinter (ENMFP) 3 to generate a fingerprint of every excerpt in GTZAN and to query the Echo Nest database having over 30,000,000 songs. The second column of Table 1 shows that this identifies only 60.6% of the excerpts.…”

Section: Identifying Excerptsmentioning

confidence: 99%

“…For Classify, most works measure MGR performance by classification accuracy (the ratio of "correct" predictions to all observations) computed from k-fold stratified cross-validation (kfCV), e.g., 2fCV (4 papers) [7,22,23,56], 3fCV (3 papers) [18,71,74], 5fCV (6 papers) [3,13,30,31,53,100], and 10fCV (55 papers) [2,5,9,11,14,16,17,24-26,28,29,34,35,37,39-42, 44,47-51,57,58,60-64,66-68,70,72,73,75,76,78,79,82-85,88-91,94-96,98,99]. Most of these use a single run of cross-validation; however, some perform multiple runs, e.g., 10 independent runs of 2fCV (10x2CV) [56] or 20x2fCV [22,23], 10x3fCV [71,74], and 10x10fCV [37,70,72,75,[83][84][85].…”

Section: Using Gtzanmentioning

confidence: 99%

See 2 more Smart Citations

The State of the Art Ten Years After a State of the Art: Future Research in Music Information Retrieval

Sturm

2014

Journal of New Music Research

View full text Add to dashboard Cite

The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge the interpretability of any result derived using it. In this article, we disprove the claims that all MGR systems are affected in the same ways by these faults, and that the performances of MGR systems in GTZAN are still meaningfully comparable since they all face the same faults. We identify and analyze the contents of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN, but to use it with consideration of its contents.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Identifying Excerptsmentioning

confidence: 99%

Section: Using Gtzanmentioning

confidence: 99%

See 1 more Smart Citation

The State of the Art Ten Years After a State of the Art: Future Research in Music Information Retrieval

Sturm

2014

Journal of New Music Research

View full text Add to dashboard Cite

show abstract

“…Since its introduction in [8], the scattering transform has found successful applications in, for example, audio genre, visual textures or medical data classification [3,11,12]. …”

Section: The Scattering Transform Of F Ismentioning

confidence: 99%

Wavelet transform modulus: phase retrieval and scattering

Waldspurger

2018

Journées Équations Aux Dérivées Partielles

View full text Add to dashboard Cite

We discuss the problem that consists in reconstructing a function from the modulus of its wavelet transform. In the case where the wavelets are Cauchy wavelets, all analytic functions are uniquely determined by this modulus. Additionally, although it is not uniformly continuous, the reconstruction operator enjoys a form of local stability. We describe these two results, and give an idea of the proof of the first one. To conclude, we present a related result on a more sophisticated operator, based on the wavelet transform modulus: the scattering transform.

show abstract

“…This is similar to MFCC coefficient computation but a scattering-subband filterbank is used The block diagram of the tanh based Scattered Transform Cepstral Coefficients (tanh-STCC) feature extraction algorithm is shown in Figure 2: The amplitude range of recorded sound data is normalized between -1 and 1 [9,13,14] before the filterbank. Pre-emphasis, framing, windowing, logarithm and the DCT block are the same as the ordinary MFCC computation.…”

Section: Tanh Based Scattered Transform Cepstral Coefficients (Tamentioning

confidence: 99%

Time-scale wavelet scattering using hyperbolic tangent function for vessel sound classification

Can

Akbas

Çetin

2017

2017 25th European Signal Processing Conference (EUSIPCO)

View full text Add to dashboard Cite

We introduce a time-frequency scattering method using hyperbolic tangent function for vessel sound classification. The sound data is wavelet transformed using a two channel filter-bank and filter-bank outputs are scattered using tanh function. A feature vector similar to mel-scale cepstrum is obtained after a wavelet packed transform-like structure approximating the mel-frequency scale. Feature vectors of vessel sounds are classified using a support vector machine (SVM). Experimental results are presented and the new feature extraction method produces better classification results than the ordinary Mel-Frequency Cepstral Coefficients (MFCC) vectors.

show abstract

Joint time-frequency scattering for audio classification

Cited by 84 publications

References 13 publications

The State of the Art Ten Years After a State of the Art: Future Research in Music Information Retrieval

The State of the Art Ten Years After a State of the Art: Future Research in Music Information Retrieval

Wavelet transform modulus: phase retrieval and scattering

Time-scale wavelet scattering using hyperbolic tangent function for vessel sound classification

Contact Info

Product

Resources

About