2021
DOI: 10.1609/aaai.v35i1.16091
|View full text |Cite
|
Sign up to set email alerts
|

Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs

Abstract: To apply neural sequence models such as the Transformers to music generation tasks, one has to represent a piece of music by a sequence of tokens drawn from a finite set of pre-defined vocabulary. Such a vocabulary usually involves tokens of various types. For example, to describe a musical note, one needs separate tokens to indicate the note’s pitch, duration, velocity (dynamics), and placement (onset time) along the time grid. While different types of tokens may possess different properties, existing models … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
41
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 92 publications
(64 citation statements)
references
References 12 publications
1
41
0
Order By: Relevance
“…Music Transformer [14] was the first Transformer-based model in symbolic music generation, showing that a Transformer model can generate coherent minute-long polyphonic piano music with reasonable repetitions and variance. Other Transformer-based models have since been proposed to generate music [9,13,15,17,24,28,36]. These models can learn musical features without hand-crafted rules and produce diverse music pieces.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Music Transformer [14] was the first Transformer-based model in symbolic music generation, showing that a Transformer model can generate coherent minute-long polyphonic piano music with reasonable repetitions and variance. Other Transformer-based models have since been proposed to generate music [9,13,15,17,24,28,36]. These models can learn musical features without hand-crafted rules and produce diverse music pieces.…”
Section: Related Workmentioning
confidence: 99%
“…We adopt the most established Transformer encoder architecture [31], which has been frequently applied in music generation as of late [13,37]. We follow the standard Transformer encoder architecture [31] and illustrate it in Figure 2 (b).…”
Section: Transformer Encoder Architecturementioning
confidence: 99%
“…REMI [36] is a common representation format that uses [bar] and [position] tokens to place tokens on a metrical grid that uniformly divides a bar into a certain number of positions and assumes symbolic timing. Based on this, Hsiao et al [58] further employed an expansion-compression trick to convert a piece of music to a sequence of compound words by grouping neighboring tokens, significantly shortening the token sequences. As a result, we convert MIDI to the compound word format as our input.…”
Section: A Midi To Compound Wordmentioning
confidence: 99%
“…As a result, we convert MIDI to the compound word format as our input. According to the practice of Hsiao et al [58], we convert MIDI into 7 types of symbols: Tempo, Chord, Bar-beat, Type, Pitch, Duration, and Velocity. For better illustration, we show part of melodies of Joe Hisaishi's "The Sun Also Rises" and transformed MIDI into compound words, as shown in Figure 2.…”
Section: A Midi To Compound Wordmentioning
confidence: 99%
See 1 more Smart Citation