Proceedings of the ACM Web Conference 2022 2022
DOI: 10.1145/3485447.3512011
|View full text |Cite
|
Sign up to set email alerts
|

Contrastive Learning with Positive-Negative Frame Mask for Music Representation

Abstract: Self-supervised learning, especially contrastive learning, has made an outstanding contribution to the development of many deep learning research fields. Recently, researchers in the acoustic signal processing field noticed its success and leveraged contrastive learning for better music representation. Typically, existing approaches maximize the similarity between two distorted audio segments sampled from the same music. In other words, they ensure a semantic agreement at the music level. However, those coarse… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 36 publications
0
5
0
Order By: Relevance
“…Contrastive Representation Learning. Contrastive representation learning (Chen et al 2020;He et al 2020;Grill et al 2020;Zhang et al 2021Zhang et al , 2022Yao et al 2022;Gan et al 2023) is a self-supervised learning method. It approximates the latent representations by constructing contrastive samples (positive and negative instances) to facilitate instance discrimination (Wu et al 2018).…”
Section: Related Workmentioning
confidence: 99%
“…Contrastive Representation Learning. Contrastive representation learning (Chen et al 2020;He et al 2020;Grill et al 2020;Zhang et al 2021Zhang et al , 2022Yao et al 2022;Gan et al 2023) is a self-supervised learning method. It approximates the latent representations by constructing contrastive samples (positive and negative instances) to facilitate instance discrimination (Wu et al 2018).…”
Section: Related Workmentioning
confidence: 99%
“…3) Audio-based pretraining, which has been studied in the context of music recommendation and retrieval. They are used to extract latent music representations to enhance recommendation and retrieval tasks, including Mu-sicBert [61], MART [52], PEMR [51], and UAE [6]. 4) Multimodal pretraining that aims to achieve multimodal content understanding and cross-modal alignment.…”
Section: Multimodal Pretraining For Recommendationmentioning
confidence: 99%
“…Moreover, users humming the query songs might not necessarily know/provide the meta-information. Overall, CSI as a challenging task has long attracted lots of researchers due to its potential applications in music representation learning [25,57], retrieval [32,46,61] and recommendation [14,35]. However, those cover songs may differ from the original song in key transposition, speed change, and structural variations, which challenges identifying the cover song.…”
Section: Related Work 21 Cover Song Identificationmentioning
confidence: 99%
“…As the labeled datasets on which supervised learning methods require extensive manual labeling, it is often costly and time-consuming, leading to limitations in the performance of supervised learning methods. For this reason, some audio researchers have adopted a self-supervised learning approach to learning musical representations [41,47,57,67]. For example, MusicBERT [67] models music self-representation with a multi-task learning framework.…”
Section: Music Representation Learningmentioning
confidence: 99%
See 1 more Smart Citation