Proceedings of the 21st International Conference on World Wide Web 2012
DOI: 10.1145/2187980.2188219
|View full text |Cite
|
Sign up to set email alerts
|

Melody, bass line, and harmony representations for music version identification

Abstract: In this paper we compare the use of different musical representations for the task of version identification (i.e. retrieving alternative performances of the same musical piece). We automatically compute descriptors representing the melody and bass line using a state-of-the-art melody extraction algorithm, and compare them to a harmony-based descriptor. The similarity of descriptor sequences is computed using a dynamic programming algorithm based on nonlinear time series analysis which has been successfully us… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
19
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(19 citation statements)
references
References 18 publications
0
19
0
Order By: Relevance
“…Considering that the similarity between two tracks can be calculated based on different descriptors and similarity functions, the complementary properties are neglected while using a single similarity function. It has been verified [6][7][8] that different descriptors and similarity functions are complementary to each other in the CSI task. To fully take advantage of the common as well as complementary information contained in different descriptors and similarity functions in describing the similarity between tracks, some researchers began to study similarity fusion algorithms for CSI.…”
Section: Introductionmentioning
confidence: 85%
See 1 more Smart Citation
“…Considering that the similarity between two tracks can be calculated based on different descriptors and similarity functions, the complementary properties are neglected while using a single similarity function. It has been verified [6][7][8] that different descriptors and similarity functions are complementary to each other in the CSI task. To fully take advantage of the common as well as complementary information contained in different descriptors and similarity functions in describing the similarity between tracks, some researchers began to study similarity fusion algorithms for CSI.…”
Section: Introductionmentioning
confidence: 85%
“…Then, the maximum value of the similarities obtained based on main melody, accompaniment, and mixture signal, separately, was taken as the final similarity. In [6], the standard classification-based fusion strategy [10] was adopted to fuse the similarities of three related yet different descriptors (harmony, melody, and bass line). In [11], the fusion of different similarities was achieved by projecting different similarities in a multi-dimensional space, where the dimensionality of the space was the number of similarities considered.…”
Section: Introductionmentioning
confidence: 99%
“…Since only chroma descriptors were considered, the fused similarity only accounted for the same musical facet, the harmony. To solve this problem, in [25], the similarities based on three related yet different descriptors (harmony, melody, and bass line) were fused with the power of a standard classification approach similar to [24]. In [31], the fusion of different similarities was achieved by projecting all similarities in a multi-dimensional space, where the dimensionality of this space was the number of similarities considered.…”
Section: Information Fusion For Csimentioning
confidence: 99%
“…To solve this problem, some researchers began to study descriptor or similarity fusion models for the CSI task [22][23][24][25][26] (see Section 2). In this paper, we propose a two-layer similarity fusion model for the CSI task aiming at enhancing the identification accuracy and classification efficiency further.…”
Section: Introductionmentioning
confidence: 99%
“…Based on this fact, for example, the system Hydra [7] combines features and distances extracted with different parameters which are fed to a Support Vector Machine which output, for each pair of songs, a single bit decision of the type cover/non-cover. A similar approach is used in [8], where a distance is calculated over three different audio descriptor and a classifier is trained with a subset of known cover or non-cover songs pairs. In our work instead, we do not apply any classification and no training is needed.…”
Section: Introductionmentioning
confidence: 99%