Tempo induction algorithm in MP3 compressed domain

D'Aguanno, Antonello; Vercellesi, G.

doi:10.1145/1290082.1290105

Cited by 4 publications

(1 citation statement)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is well known that Zernike moment has been widely used in many image-related research fields such as image recognition [11], image watermarking [12], human face recognition [13], and image analysis [14] due to its prominent property of strong robustness and rotation, scale, and translation (RST) invariance. So far, various compressed domain audio features including scale factors [15,16], MP3 window-switching pattern [17,18], basic MDCT coefficients and derived spectral energy, energy variation, duration of energy peaks, amplitude envelope, spectrum centroid, spectrum spread, spectrum flux, roll-off, RMS, rhythmic content like beat histogram [19][20][21][22][23][24] have been used in different applications such as retrieval, segmentation, genre classification, speech/ music discrimination, summarization, singer identification, watermarking, and beat tracing/tempo induction. However, in spite of the extensive use in various imagerelated research fields for years, to the authors' knowledge, Zernike moment has not yet been applied to music information retrieval.…”

Section: Introductionmentioning

confidence: 99%

Low-order auditory Zernike moment: a novel approach for robust music identification in the compressed domain

Xiao

Liu

2013

EURASIP J. Adv. Signal Process.

View full text Add to dashboard Cite

Audio identification via fingerprint has been an active research field for years. However, most previously reported methods work on the raw audio format in spite of the fact that nowadays compressed format audio, especially MP3 music, has grown into the dominant way to store music on personal computers and/or transmit it over the Internet. It will be interesting if a compressed unknown audio fragment could be directly recognized from the database without decompressing it into the wave format at first. So far, very few algorithms run directly on the compressed domain for music information retrieval, and most of them take advantage of the modified discrete cosine transform coefficients or derived cepstrum and energy type of features. As a first attempt, we propose in this paper utilizing compressed domain auditory Zernike moment adapted from image processing techniques as the key feature to devise a novel robust audio identification algorithm. Such fingerprint exhibits strong robustness, due to its statistically stable nature, against various audio signal distortions such as recompression, noise contamination, echo adding, equalization, band-pass filtering, pitch shifting, and slight time scale modification. Experimental results show that in a music database which is composed of 21,185 MP3 songs, a 10-s long music segment is able to identify its original near-duplicate recording, with average top-5 hit rate up to 90% or above even under severe audio signal distortions.

show abstract

Section: Introductionmentioning

confidence: 99%

Low-order auditory Zernike moment: a novel approach for robust music identification in the compressed domain

Xiao

Liu

2013

EURASIP J. Adv. Signal Process.

View full text Add to dashboard Cite

show abstract

Music information retrieval in compressed audio files: a survey

Zampoglou

Malamos

2014

New Review of Hypermedia and Multimedia

View full text Add to dashboard Cite

Robust audio identification for MP3 popular music

Liu

Xue

2010

Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

Audio identification via fingerprint has been an active research field with wide applications for years. Many technical papers were published and commercial software systems were also employed. However, most of these previously reported methods work on the raw audio format in spite of the fact that nowadays compressed format audio, especially MP3 music, has grown into the dominant way to store on personal computers and transmit on the Internet. It would be interesting if a compressed unknown audio fragment is able to be directly recognized from the database without the fussy and time-consuming decompression-identification-recompression procedure. So far, very few algorithms run directly in the compressed domain for music information retrieval, and most of them take advantage of MDCT coefficients or derived energy type of features. As a first attempt, we propose in this paper utilizing compressed-domain spectral entropy as the audio feature to implement a novel audio fingerprinting algorithm. The compressed songs stored in a music database and the possibly distorted compressed query excerpts are first partially decompressed to obtain the MDCT coefficients as the intermediate result. Then by grouping granules into longer blocks, remapping the MDCT coefficients into 192 new frequency lines to unify the frequency distribution of long and short windows, and defining 9 new subbands which cover the main frequency bandwidth of popular songs in accordance with the scale-factor bands of short windows, we calculate the spectral entropy of all consecutive blocks and come to the final fingerprint sequence by means of magnitude relationship modeling. Experiments show that such fingerprints exhibit strong robustness against various audio signal distortions like recompression, noise interference, echo addition, equalization, band-pass filtering, pitch shifting, and slight time-scale modification etc. For 5s-long query examples which might be severely degraded, an average top-five retrieval precision rate of more than 90% can be obtained in our test data set composed of 1822 popular songs.

show abstract

Tempo induction algorithm in MP3 compressed domain

Cited by 4 publications

References 15 publications

Low-order auditory Zernike moment: a novel approach for robust music identification in the compressed domain

Low-order auditory Zernike moment: a novel approach for robust music identification in the compressed domain

Music information retrieval in compressed audio files: a survey

Robust audio identification for MP3 popular music

Contact Info

Product

Resources

About