2021
DOI: 10.48550/arxiv.2101.02402
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs

Abstract: To apply neural sequence models such as the Transformers to music generation tasks, one has to represent a piece of music by a sequence of tokens drawn from a finite set of pre-defined vocabulary. Such a vocabulary usually involves tokens of various types. For example, to describe a musical note, one needs separate tokens to indicate the note's pitch, duration, velocity (dynamics), and placement (onset time) along the time grid. While different types of tokens may possess different properties, existing models … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
25
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(26 citation statements)
references
References 22 publications
1
25
0
Order By: Relevance
“…This new representation helps to maintain the flexibility of local tempo changes and provides a basis upon which we can control the rhythmic and harmonic structure of the music. Compound words [9] (CP) further converts REMI tokens to a sequence of compound words by grouping neighboring tokens, which greatly reduces the length of the token sequence. In this paper, we employ a representation based on CP.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…This new representation helps to maintain the flexibility of local tempo changes and provides a basis upon which we can control the rhythmic and harmonic structure of the music. Compound words [9] (CP) further converts REMI tokens to a sequence of compound words by grouping neighboring tokens, which greatly reduces the length of the token sequence. In this paper, we employ a representation based on CP.…”
Section: Related Workmentioning
confidence: 99%
“…Some [27] [4] consider piano rolls as 2-D images and build models based on convolution networks. Since music and language are both represented as sequences, the transformer and its variants are also frequently used as the backbone of music generation models [10] [11] [3] [9]. Apart from generating symbolic music, some models generate audio directly in waveform [15] [5] [14] or indirectly through transcription and synthesis [7].…”
Section: Related Workmentioning
confidence: 99%
“…The capabilities of symbolic music generative models have been steadily improving with notable contributions by Huang et al (2018), Payne (2019), Huang & Yang (2020) and Hsiao et al (2021). This line of work focuses on improving the quality of generated samples but does not contribute substantially toward controllable generation.…”
Section: Related Workmentioning
confidence: 99%
“…In REMI, a set of note-on, note-off events of the same note can be represented as a single token, which resembles note value in musical scores. Recent research has also revealed that the sequence model can generate note-level representations in an efficient form where related tokens are grouped [10]. In this paper, we design notation-level token representations that correspond to a musical score.…”
Section: Related Work 21 Musical Score Generationmentioning
confidence: 99%
“…In the research area of music generation, many studies have reported that sequence models, such as the Transformer [25], fit the task quite well [10][11][12]. Although many Transformer variants have been proposed [23], we adopt the vanilla Transformer model to utilize its original attention mechanism.…”
Section: Modelmentioning
confidence: 99%