Fusing similarity functions for cover song identification

Chen, Ning; Li, Wei; Xiao, Haidong

doi:10.1007/s11042-017-4456-9

Cited by 45 publications

(34 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Results on Da-TACOS 2DFTM [17] 0.275 155 SiMPle [18] 0.332 142 Dmax [14] 0.322 132 Qmax [10] 0.365 113 Qmax* [30] 0.373 104 EarlyFusion [12] 0.426 116 LateFusion [14] 0.454 177 MOVE w/ d = 4 k (ours) 0.489 43 MOVE w/ d = 16 k (ours) 0.506 42 Results on YTC SiMPle [18] 0.591 8 2DFTM sequences [29] 0.648 8 InNet [19] 0.660 6 SuCo-DTW [31] 0.800 3 CQT-TPPNet [20] 0.859 3 MOVE w/ d = 16 k (ours) 0.885 3 Table 2. Comparison of state-of-the-art VI systems (best results are highlighted in bold).…”

Section: Map Mr1mentioning

confidence: 99%

Accurate and Scalable Version Identification Using Musically-Motivated Embeddings

Yesiler

Serrà²

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

The version identification (VI) task deals with the automatic detection of recordings that correspond to the same underlying musical piece. Despite many efforts, VI is still an open problem, with much room for improvement, specially with regard to combining accuracy and scalability. In this paper, we present MOVE, a musically-motivated method for accurate and scalable version identification. MOVE achieves state-of-the-art performance on two publicly-available benchmark sets by learning scalable embeddings in an Euclidean distance space, using a triplet loss and a hard triplet mining strategy. It improves over previous work by employing an alternative input representation, and introducing a novel technique for temporal content summarization, a standardized latent space, and a data augmentation strategy specifically designed for VI. In addition to the main results, we perform an ablation study to highlight the importance of our design choices, and study the relation between embedding dimensionality and model performance.Index Terms-Cover song identification, deep learning, music embedding, network encoder.

show abstract

Section: Map Mr1mentioning

confidence: 99%

Accurate and Scalable Version Identification Using Musically-Motivated Embeddings

Yesiler

Serrà²

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…Results on Youtube DPLA [2] 0.525 0.132 9.43 2420s SiMPle [15] 0.591 0.140 7.91 18.7s Fingerprinting [16] 0.648 0.145 8.27 -SuCo-DTW [17] 0.800 0.180 3.42 4.59s Ki-CNN [8] 0.656 0.155 6.26 0.35ms TPPNet [9] 0.859 0.188 2.85 0.04ms CQT-Net 0.917 0.192 2.50 0.04ms Results on Covers80 NCP-WIDI [18] 0.645 ---CRP [3] 0.544 0.061 --Fusing [19] 0.625 0.071 --Ki-CNN [8] 0.506 0.068 16.4 0.55ms TPPNet [9] 0.744 0.086 6.88 0.06ms CQT-Net 0.840 0.091 3.85 0.06ms Results on Mazurkas DTW [15] 0.882 0.949 4.05 -NCD [20] 0.767 ---Compression [21] 0.795 ---Fingerprinting [22] 0.819 ---SiMPle [15] 0.880 0.952 2.33 -SuCo-repeat [17] 0.850 0.940 2.77 -2DFM [4] 0 Table 1. Performance on different datasets (-indicates the results are not shown in original works).…”

Section: Mr1 Timementioning

confidence: 99%

Learning a Representation for Cover Song Identification Using Convolutional Neural Network

Chen

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Cover song identification represents a challenging task in the field of Music Information Retrieval (MIR) due to complex musical variations between query tracks and cover versions. Previous works typically utilize hand-crafted features and alignment algorithms for the task. More recently, further breakthroughs are achieved employing neural network approaches. In this paper, we propose a novel Convolutional Neural Network (CNN) architecture based on the characteristics of the cover song task. We first train the network through classification strategies; the network is then used to extract music representation for cover song identification. A scheme is designed to train robust models against tempo changes. Experimental results show that our approach outperforms state-of-the-art methods on all public datasets, improving the performance especially on the large dataset.

show abstract

“…For instance, Tzanetakis et al proposed pitch histogram to represent tonality [Tzanetakis et al, 2003]. Chroma and its variants were extensively deployed to this task [Ellis and Poliner, 2007;Serrà et al, 2008;Grosche and Müller, 2012;Silva et al, 2016;Cheng et al, 2017].…”

Section: Audio Featurementioning

confidence: 99%

“…For sequential representations, dynamic programming was a routinely used approach to measure the similarity of sequential descriptors. Through searching the optimal correspondences between two sequential representations, these algorithms helped reduce the impacts of local structure variations and thus achieved high precision [Bello, 2007;Serrà et al, 2008;Martin et al, 2012;Cheng et al, 2017]. For other approaches, though they did not use dynamic programming explicitly, they computed cross-similarity between the sequences and required comparable complexity [Grosche and…”

Section: Similarity Measurementioning

confidence: 99%

See 1 more Smart Citation

Temporal Pyramid Pooling Convolutional Neural Network for Cover Song Identification

Chen

et al. 2019

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

Cover song identification is an important problem in the field of Music Information Retrieval. Most existing methods rely on hand-crafted features and sequence alignment methods, and further breakthrough is hard to achieve. In this paper, Convolutional Neural Networks (CNNs) are used for representation learning toward this task. We show that they could be naturally adapted to deal with key transposition in cover songs. Additionally, Temporal Pyramid Pooling is utilized to extract information on different scales and transform songs with different lengths into fixed-dimensional representations. Furthermore, a training scheme is designed to enhance the robustness of our model. Extensive experiments demonstrate that combined with these techniques, our approach is robust against musical variations existing in cover songs and outperforms state-of-the-art methods on several datasets with low time complexity.

show abstract

Fusing similarity functions for cover song identification

Cited by 45 publications

References 19 publications

Accurate and Scalable Version Identification Using Musically-Motivated Embeddings

Accurate and Scalable Version Identification Using Musically-Motivated Embeddings

Learning a Representation for Cover Song Identification Using Convolutional Neural Network

Temporal Pyramid Pooling Convolutional Neural Network for Cover Song Identification

Contact Info

Product

Resources

About