Background music not only provides auditory experience for users, but also conveys, guides, and promotes emotions that resonate with visual contents. Studies on how to synthesize background music for different scenes can promote research in many fields, such as human behaviour research. Although considerable effort has been directed toward music synthesis, the synthesis of appropriate music based on scene visual content remains an open problem. In this paper we introduce an interactive background music synthesis algorithm guided by visual content. We leverage a cascading strategy to synthesize background music in two stages: Scene Visual Analysis and Background Music Synthesis. First, seeking a deep learning-based solution, we leverage neural networks to analyze the sentiment of the input scene. Second, real-time background music is synthesized by optimizing a cost function that guides the selection and transition of music clips to maximize the emotion consistency between visual and auditory criteria, and music continuity. In our experiments, we demonstrate the proposed approach can synthesize dynamic background music for different types of scenarios. We also conducted quantitative and qualitative analysis on the synthesized results of multiple example scenes to validate the efficacy of our approach. CCS CONCEPTS • Applied computing → Sound and music computing; • Humancentered computing → Virtual reality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.