2018
DOI: 10.1007/978-3-319-73603-7_44
|View full text |Cite
|
Sign up to set email alerts
|

Triplet Convolutional Network for Music Version Identification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 14 publications
0
8
0
Order By: Relevance
“…We conducted comparative experiments to evaluate the effectiveness of our method. We took a ranking-based evaluation approach [25], [47]- [49] to quantitatively evaluate the performance of compatibility estimation. Ranking-based evaluation is used to assess how accurately a target can be found from multiple candidates.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…We conducted comparative experiments to evaluate the effectiveness of our method. We took a ranking-based evaluation approach [25], [47]- [49] to quantitatively evaluate the performance of compatibility estimation. Ranking-based evaluation is used to assess how accurately a target can be found from multiple candidates.…”
Section: Discussionmentioning
confidence: 99%
“…Because data of different modalities can be treated as identical data in a jointembedding space and trained under a common metric, deep metric learning and joint-embedding techniques perform well together. In MIR-related tasks, deep metric learning succeeds in learning joint representations over several modalities such as a vocal and mix [23], vocal imitation and sound recording [24], [25], animal sounds [26], sheet music and audio spectrograms [27], music and image [28]- [31], and music and video [21], [22]. The target pair for the metric learning described in this paper consists of a vocal track and an accompaniment track.…”
Section: B Self-supervised and Joint-embedding Techniquesmentioning
confidence: 99%
See 1 more Smart Citation
“…At the lower end of the specificity scale are tasks such as genre recognition [36]. A medium-level specificity is associated with tasks such as audio matching [12,14], version identification [31,37], live song detection [28,38], and cover song retrieval [8,10,13,15,16,[21][22][23][24][25][26][27]29,30]. In all of these tasks, one allows for variations as they typically occur in different performances and arrangements of a piece of music.…”
Section: Related Workmentioning
confidence: 99%
“…An example of such a loss function is the hinge loss, which has been used to learn representations for cross-modal score-to-audio embeddings [45] and for artist similarity [46]. A related loss function is the triplet loss, which was developed for the task of face recognition [33] and then adapted for audio and music processing tasks such as speech retrieval [47], sound event classification [48], audio fingerprinting [49], artist clustering [50], music similarity [51], and cover song retrieval [37]. The latter study is conceptually similar to our approach, but presents results that are worse than those achieved by more traditional approaches [21].…”
Section: Neural Network With Triplet Lossmentioning
confidence: 99%