Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413894
|View full text |Cite
|
Sign up to set email alerts
|

Scene-Aware Background Music Synthesis

Abstract: Background music not only provides auditory experience for users, but also conveys, guides, and promotes emotions that resonate with visual contents. Studies on how to synthesize background music for different scenes can promote research in many fields, such as human behaviour research. Although considerable effort has been directed toward music synthesis, the synthesis of appropriate music based on scene visual content remains an open problem. In this paper we introduce an interactive background music synthes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(4 citation statements)
references
References 42 publications
0
4
0
Order By: Relevance
“…Museformer, proposed by Yu et al [YLW*22] also use a transformer that incorporates novel fine‐ and coarsegrained attention mechanisms for music generation, capturing both music structure‐related correlations and additional contextual information, leading to high‐quality, well‐structured long music sequences. An interesting work by Wang et al [WLL*20] introduces an algorithm for synthesizing interactive background music based on visual content. Using neural networks for scene sentiment analysis and a cost function for music synthesis, it ensures emotional consistency between visual and auditory elements, as well as music continuity.…”
Section: Multi‐modal Datasets Of Performing Musicmentioning
confidence: 99%
“…Museformer, proposed by Yu et al [YLW*22] also use a transformer that incorporates novel fine‐ and coarsegrained attention mechanisms for music generation, capturing both music structure‐related correlations and additional contextual information, leading to high‐quality, well‐structured long music sequences. An interesting work by Wang et al [WLL*20] introduces an algorithm for synthesizing interactive background music based on visual content. Using neural networks for scene sentiment analysis and a cost function for music synthesis, it ensures emotional consistency between visual and auditory elements, as well as music continuity.…”
Section: Multi‐modal Datasets Of Performing Musicmentioning
confidence: 99%
“…To begin, they use neural networks to assess the emotion of the source scene in search of a deep learning-based solution. Second, to improve the consistency of feeling between auditory and visual criteria and music continuity, actual background music is generated by maximizing objective functions that directs the choosing and movement of music clips [18]. To create background music that matches the provided video.…”
Section: Video Background Music Recognitionmentioning
confidence: 99%
“…Additionally, it may be used to propose background music depending on the artist's speech emotions. Research into how to synthesize background music for various scenes might aid in the advancement of knowledge in a variety of fields, including human behavior (Wang et al;2020). Even though considerable study has been conducted on music synthesis, the challenge of selecting appropriate music based on scene emotion remains unsolved.…”
Section: Motivationmentioning
confidence: 99%
“…This paper offers an introduction to multimodal music emotion detection and highlights the importance of audio and lyrics as key elements for categorizing music based on its emotional content, as well as advocating more study in this field utilizing deep learning. Wang et al (2020) recently published research in which they contributed to a technique for autonomously synthesizing real-time background music while a user navigates around a virtual setting. This technique, which is based on visual sentiment analysis, creates music that corresponds to the emotional states conveyed in the picture while allowing for a seamless transition.…”
Section: Cnn-lstmmentioning
confidence: 99%