Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs

Hsiao, Wen-Yi; Liu, Jen-Yu; Yeh, Yin-Cheng; Yang, Yi-Hsuan

doi:10.48550/arxiv.2101.02402

Cited by 11 publications

(26 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This new representation helps to maintain the flexibility of local tempo changes and provides a basis upon which we can control the rhythmic and harmonic structure of the music. Compound words [9] (CP) further converts REMI tokens to a sequence of compound words by grouping neighboring tokens, which greatly reduces the length of the token sequence. In this paper, we employ a representation based on CP.…”

Section: Related Workmentioning

confidence: 99%

“…Some [27] [4] consider piano rolls as 2-D images and build models based on convolution networks. Since music and language are both represented as sequences, the transformer and its variants are also frequently used as the backbone of music generation models [10] [11] [3] [9]. Apart from generating symbolic music, some models generate audio directly in waveform [15] [5] [14] or indirectly through transcription and synthesis [7].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Video Background Music Generation with Controllable Music Transformer

Jiang

Liu

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

View full text Add to dashboard Cite

In this work, we address the task of video background music generation. Some previous works achieve effective music generation but are unable to generate melodious music tailored to a particular video, and none of them considers the video-music rhythmic consistency. To generate the background music that matches the given video, we first establish the rhythmic relations between video and background music. In particular, we connect timing, motion speed, and motion saliency from video with beat, simu-note density, and simu-note strength from music, respectively. We then propose CMT, a Controllable Music Transformer that enables local control of the aforementioned rhythmic features and global control of the music genre and instruments. Objective and subjective evaluations show that the generated background music has achieved satisfactory compatibility with the input videos, and at the same time, impressive music quality. Code and models are available at https://github.com/wzk1015/video-bgm-generation. CCS CONCEPTS• Applied computing → Sound and music computing.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Video Background Music Generation with Controllable Music Transformer

Jiang

Liu

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

View full text Add to dashboard Cite

show abstract

“…The capabilities of symbolic music generative models have been steadily improving with notable contributions by Huang et al (2018), Payne (2019), Huang & Yang (2020) and Hsiao et al (2021). This line of work focuses on improving the quality of generated samples but does not contribute substantially toward controllable generation.…”

Section: Related Workmentioning

confidence: 99%

FIGARO: Generating Symbolic Music with Fine-Grained Artistic Control

Dimitri¹,

Biggio²,

Kilcher³

et al. 2022

Preprint

View full text Add to dashboard Cite

Generating music with deep neural networks has been an area of active research in recent years. While the quality of generated samples has been steadily increasing, most methods are only able to exert minimal control over the generated sequence, if any. We propose the self-supervised description-to-sequence task, which allows for fine-grained controllable generation on a global level. We do so by extracting high-level features about the target sequence and learning the conditional distribution of sequences given the corresponding high-level description in a sequence-tosequence modelling setup. We train FIGARO (FIne-grained music Generation via Attentionbased, RObust control) by applying descriptionto-sequence modelling to symbolic music. By combining learned high level features with domain knowledge, which acts as a strong inductive bias, the model achieves state-of-the-art results in controllable symbolic music generation and generalizes well beyond the training distribution.

show abstract

“…In REMI, a set of note-on, note-off events of the same note can be represented as a single token, which resembles note value in musical scores. Recent research has also revealed that the sequence model can generate note-level representations in an efficient form where related tokens are grouped [10]. In this paper, we design notation-level token representations that correspond to a musical score.…”

Section: Related Work 21 Musical Score Generationmentioning

confidence: 99%

“…In the research area of music generation, many studies have reported that sequence models, such as the Transformer [25], fit the task quite well [10][11][12]. Although many Transformer variants have been proposed [23], we adopt the vanilla Transformer model to utilize its original attention mechanism.…”

Section: Modelmentioning

confidence: 99%

Score Transformer: Generating Musical Score from Note-level Representation

Suzuki

2021

ACM Multimedia Asia

View full text Add to dashboard Cite

In this paper, we explore the tokenized representation of musical scores using the Transformer model to automatically generate musical scores. Thus far, sequence models have yielded fruitful results with note-level (MIDI-equivalent) symbolic representations of music. Although the note-level representations can comprise sufficient information to reproduce music aurally, they cannot contain adequate information to represent music visually in terms of notation. Musical scores contain various musical symbols (e.g., clef, key signature, and notes) and attributes (e.g., stem direction, beam, and tie) that enable us to visually comprehend musical content. However, automated estimation of these elements has yet to be comprehensively addressed. In this paper, we first design score token representation corresponding to the various musical elements. We then train the Transformer model to transcribe note-level representation into appropriate music notation. Evaluations of popular piano scores show that the proposed method significantly outperforms existing methods on all 12 musical aspects that were investigated. We also explore an effective notation-level token representation to work with the model and determine that our proposed representation produces the steadiest results. CCS CONCEPTS• Applied computing → Sound and music computing.

show abstract

Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs

Cited by 11 publications

References 22 publications

Video Background Music Generation with Controllable Music Transformer

Video Background Music Generation with Controllable Music Transformer

FIGARO: Generating Symbolic Music with Fine-Grained Artistic Control

Score Transformer: Generating Musical Score from Note-level Representation

Contact Info

Product

Resources

About