2016
DOI: 10.1109/tmm.2016.2557722
|View full text |Cite
|
Sign up to set email alerts
|

Bridging Music and Image via Cross-Modal Ranking Analysis

Abstract: Human perceptions of music and image are closely related to each other, since both can inspire similar human sensations, such as emotion, motion, and power. This paper aims to explore whether and how music and image can be automatically matched by machines. The main contributions are three aspects. First, we construct a benchmark dataset composed of more than 45, 000 music-image pairs. Human labelers are recruited to annotate whether these pairs are well-matched or not. The results show that they generally agr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(10 citation statements)
references
References 46 publications
0
10
0
Order By: Relevance
“…Third, large music collections contain different modalities of information, i.e., audio, images, and text, and all these data are suitable to be exploited for genre classification. Several approaches dealing with different modalities have been proposed (Wu et al, 2016;Schedl et al, 2013). However, to the best of our knowledge, no multimodal approach based on deep learning architectures has been proposed for this Music Information Retrieval (MIR) task, neither for singlelabel nor multi-label classification.…”
Section: Introductionmentioning
confidence: 99%
“…Third, large music collections contain different modalities of information, i.e., audio, images, and text, and all these data are suitable to be exploited for genre classification. Several approaches dealing with different modalities have been proposed (Wu et al, 2016;Schedl et al, 2013). However, to the best of our knowledge, no multimodal approach based on deep learning architectures has been proposed for this Music Information Retrieval (MIR) task, neither for singlelabel nor multi-label classification.…”
Section: Introductionmentioning
confidence: 99%
“…Co-occurring changes in audio and video content of music videos can be detected, where the correlations can be used in cross-modal audio-visual music retrieval. Lyrics-based music attributes are utilized for image representation in [16]. Cross-modal ranking analysis is suggested to learn semantic similarity between music and image, with the aim of obtaining the optimal embedding spaces for music and image.…”
Section: B Cross-modal Music Retrievalmentioning
confidence: 99%
“…As online music streaming and video sharing websites arXiv:1801.02200v1 [cs.IR] 7 Jan 2018 have become increasingly popular, some research has been done on the relationship between music and album covers [1,2,3,4] and also on music and videos (instead of just images) as the visual modality [5,6,7,8] to explore the multimodal information present in both types of data. A recent study [9] also explored the cross-modal relations between the two modalities but using images with people talking and speech.…”
Section: Related Workmentioning
confidence: 99%