Proceedings of the 29th ACM International Conference on Multimedia 2021
DOI: 10.1145/3474085.3475195
|View full text |Cite
|
Sign up to set email alerts
|

Video Background Music Generation with Controllable Music Transformer

Abstract: In this work, we address the task of video background music generation. Some previous works achieve effective music generation but are unable to generate melodious music tailored to a particular video, and none of them considers the video-music rhythmic consistency. To generate the background music that matches the given video, we first establish the rhythmic relations between video and background music. In particular, we connect timing, motion speed, and motion saliency from video with beat, simu-note density… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
24
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 52 publications
(28 citation statements)
references
References 13 publications
1
24
0
Order By: Relevance
“…Dance2Music [1]: Similar to [16], the generated music with this method is also monotonic in terms of the musical instrument. Controllable Music Transformer (CMT) [10]: CMT is a Transformer-based model proposed for video background music generation using MIDI representation. In addition to the above cross-modality models that are closely related to our work, we also consider Ground Truth: GT samples are the original music from dance videos.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Dance2Music [1]: Similar to [16], the generated music with this method is also monotonic in terms of the musical instrument. Controllable Music Transformer (CMT) [10]: CMT is a Transformer-based model proposed for video background music generation using MIDI representation. In addition to the above cross-modality models that are closely related to our work, we also consider Ground Truth: GT samples are the original music from dance videos.…”
Section: Methodsmentioning
confidence: 99%
“…Gan et al [16] propose a graph-based transformer framework to generate music from performance videos using raw movement as input. Di et al [10] propose to generate video background music conditioned on the motion and special timing/rhythmic features of the input videos. In contrast to these previous works, our work combines three modalities, which takes the vision and motion data as input and generates music accordingly.…”
Section: Audio Vision and Motionmentioning
confidence: 99%
“…Fine-grained control has been a topic of interest in the recent literature (Choi et al, 2020;Hadjeres & Crestel, 2020;Wu & Yang, 2021;Di et al, 2021;Ferreira & Whitehead, 2021) and is an essential property when considering userdirected applications. In essence, fine-grained control is necessary to allow control over salient features in the generation, as saliency in music at least partly lies in how it changes over time.…”
Section: Related Workmentioning
confidence: 99%
“…Wang et al proposed PianoTree VAE [31], which uses GRU to encode notes played at the same time and map them to a latent space to achieve controllable generation of polyphonic music based on a tree structure. Di et al achieved rhythmic consistency between video and background music and proposed Controllable Music Transformer [12] to locally control the rhythm while globally controlling the music genre and instruments.…”
Section: Controllable Music Generationmentioning
confidence: 99%