“…Descriptor class Timbral Bark bands [35], [37] MFCCs [13], [35], [37], [38] Pitch [39], pitch centroid [40] Spectral centroid, spread, kurtosis, rolloff, decrease, skewness [35], [37], [41] High-frequency content [39], [41] Spectral complexity [35] Spectral crest, flatness, flux [37], [41] Spectral energy, energy bands, strong peak, tristimulus [41] Inharmonicity, odd to even harmonic energy ratio [37] Rhythmic BPM, onset rate [35], [39], [41] Beats loudness, beats loudness bass [40] Tonal Transposed and untransposed harmonic pitch class profiles, key strength [35], [42] Tuning frequency [42] Dissonance [35], [43] Chord change rate [35] Chords histogram, equal tempered deviations, non-tempered/tempered energy ratio, diatonic strength [40] Miscellaneous Average loudness [37] Zero-crossing rate [13], [37] 1) Euclidean distance based on principal component analysis (L 2 -PCA): As a starting point, we follow the ideas proposed by Cano et al [19], and apply an unweighted Euclidean metric on a manually selected subset of the descriptors outlined above 6 . This subset includes bark bands, pitch, spectral centroid, spread, kurtosis, rolloff, decrease, skewness, high-frequency content, spectral complexity, spectral crest, flatness, flux, spectral energy, energy bands, strong peak, tristimulus, inharmonicity, odd to even harmonic energy ratio, beats loudness, beats loudness bass, untransposed harmonic pitch class profiles, key strength, average loudness, and zerocrossing rate.…”