2019
DOI: 10.3390/app10010019
|View full text |Cite
|
Sign up to set email alerts
|

Learning Low-Dimensional Embeddings of Audio Shingles for Cross-Version Retrieval of Classical Music

Abstract: Cross-version music retrieval aims at identifying all versions of a given piece of music using a short query audio fragment. One previous approach, which is particularly suited for Western classical music, is based on a nearest neighbor search using short sequences of chroma features, also referred to as audio shingles. From the viewpoint of efficiency, indexing and dimensionality reduction are important aspects. In this paper, we extend previous work by adapting two embedding techniques; one is based on class… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
21
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(21 citation statements)
references
References 39 publications
0
21
0
Order By: Relevance
“…This was typically achieved using an objective function such as a triplet loss, yielding promising results on datasets of up to fifty thousand tracks. In a similar vein, Zalkow and Müller [31] proposed learning an embedding of short audio shingles and demonstrated the efficiency of this approach for Western classical music.…”
Section: -Today: Toward Data-driven VI Systemsmentioning
confidence: 99%
See 2 more Smart Citations
“…This was typically achieved using an objective function such as a triplet loss, yielding promising results on datasets of up to fifty thousand tracks. In a similar vein, Zalkow and Müller [31] proposed learning an embedding of short audio shingles and demonstrated the efficiency of this approach for Western classical music.…”
Section: -Today: Toward Data-driven VI Systemsmentioning
confidence: 99%
“…Sequence windowing -Following the idea of comparing subsequences rather than entire tracks, this technique avoids a segmentation or thumbnailing step by dividing a representation into short, overlapping segments of a fixed size (also called shingles) using a predetermined hop length between offsets of consecutive windows [8], [22], [20]. After obtaining multiple shingles for each input, such shingles can be aggregated by computing their mean or median [20], the distances obtained between an item and multiple shingles can be aggregated [31], or each shingle can be used individually for fragment-level retrieval.…”
Section: Structure Invariancementioning
confidence: 99%
See 1 more Smart Citation
“…Specifically, Learning Low-Dimensional Embeddings of Audio Shingles for Cross-Version Retrieval of Classical Music by Frank Zalkow and Meinard Müller [4] shows through robust experiments that, when using neural networks, one can strongly reduce the audio shingle dimensionality with only a moderate loss in retrieval accuracy compared to the use of traditional principal component analysis. In [5], Baijun Xie, Jonathan C. Kim, and Chung Hyuk predict the level of arousal and valence in music from specific spectral features using traditional regression models and, then, deep convolutional neural networks.…”
Section: Capture and Analysis Of Audio And Musicmentioning
confidence: 99%
“…This type of approach, often called translation since it implies "translating" one modality to another (e.g. being able to retrieve an image with a description of it) has received renewed attention recently given the combined efforts of the computer vision and natural language processing communities, and has been gaining more interests in the MIR community [12][13][14][15][16][17][18]. Recently, it has been proposed to learn translated representations using selfsupervision [11] which is very promising since it doesn't rely on human-annotated data, but has the drawback of requiring millions pairs of raw data to train embedding models from scratch.…”
Section: Introductionmentioning
confidence: 99%