Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 2019
DOI: 10.24963/ijcai.2019/673
|View full text |Cite
|
Sign up to set email alerts
|

Temporal Pyramid Pooling Convolutional Neural Network for Cover Song Identification

Abstract: Cover song identification is an important problem in the field of Music Information Retrieval. Most existing methods rely on hand-crafted features and sequence alignment methods, and further breakthrough is hard to achieve. In this paper, Convolutional Neural Networks (CNNs) are used for representation learning toward this task. We show that they could be naturally adapted to deal with key transposition in cover songs. Additionally, Temporal Pyramid Pooling is utilized to extract information on different scale… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
46
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 34 publications
(46 citation statements)
references
References 2 publications
0
46
0
Order By: Relevance
“…Results on Da-TACOS 2DFTM [17] 0.275 155 SiMPle [18] 0.332 142 Dmax [14] 0.322 132 Qmax [10] 0.365 113 Qmax* [30] 0.373 104 EarlyFusion [12] 0.426 116 LateFusion [14] 0.454 177 MOVE w/ d = 4 k (ours) 0.489 43 MOVE w/ d = 16 k (ours) 0.506 42 Results on YTC SiMPle [18] 0.591 8 2DFTM sequences [29] 0.648 8 InNet [19] 0.660 6 SuCo-DTW [31] 0.800 3 CQT-TPPNet [20] 0.859 3 MOVE w/ d = 16 k (ours) 0.885 3 Table 2. Comparison of state-of-the-art VI systems (best results are highlighted in bold).…”
Section: Map Mr1mentioning
confidence: 99%
“…Results on Da-TACOS 2DFTM [17] 0.275 155 SiMPle [18] 0.332 142 Dmax [14] 0.322 132 Qmax [10] 0.365 113 Qmax* [30] 0.373 104 EarlyFusion [12] 0.426 116 LateFusion [14] 0.454 177 MOVE w/ d = 4 k (ours) 0.489 43 MOVE w/ d = 16 k (ours) 0.506 42 Results on YTC SiMPle [18] 0.591 8 2DFTM sequences [29] 0.648 8 InNet [19] 0.660 6 SuCo-DTW [31] 0.800 3 CQT-TPPNet [20] 0.859 3 MOVE w/ d = 16 k (ours) 0.885 3 Table 2. Comparison of state-of-the-art VI systems (best results are highlighted in bold).…”
Section: Map Mr1mentioning
confidence: 99%
“…Each song in Youtube has 7 versions, with 2 original versions and 5 different versions and thus results in 350 recordings in total. In our experiment, we use the 100 original versions as references and the others as queries following the same as [15,9,8].…”
Section: Datasetmentioning
confidence: 99%
“…Results on Youtube DPLA [2] 0.525 0.132 9.43 2420s SiMPle [15] 0.591 0.140 7.91 18.7s Fingerprinting [16] 0.648 0.145 8.27 -SuCo-DTW [17] 0.800 0.180 3.42 4.59s Ki-CNN [8] 0.656 0.155 6.26 0.35ms TPPNet [9] 0.859 0.188 2.85 0.04ms CQT-Net 0.917 0.192 2.50 0.04ms Results on Covers80 NCP-WIDI [18] 0.645 ---CRP [3] 0.544 0.061 --Fusing [19] 0.625 0.071 --Ki-CNN [8] 0.506 0.068 16.4 0.55ms TPPNet [9] 0.744 0.086 6.88 0.06ms CQT-Net 0.840 0.091 3.85 0.06ms Results on Mazurkas DTW [15] 0.882 0.949 4.05 -NCD [20] 0.767 ---Compression [21] 0.795 ---Fingerprinting [22] 0.819 ---SiMPle [15] 0.880 0.952 2.33 -SuCo-repeat [17] 0.850 0.940 2.77 -2DFM [4] 0 Table 1. Performance on different datasets (-indicates the results are not shown in original works).…”
Section: Mr1 Timementioning
confidence: 99%
See 2 more Smart Citations