Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs

Hsiao, Wen-Yi; Liu, Jen-Yu; Yeh, Yin-Cheng; Yang, Yi-Hsuan

doi:10.1609/aaai.v35i1.16091

Cited by 92 publications

(64 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Music Transformer [14] was the first Transformer-based model in symbolic music generation, showing that a Transformer model can generate coherent minute-long polyphonic piano music with reasonable repetitions and variance. Other Transformer-based models have since been proposed to generate music [9,13,15,17,24,28,36]. These models can learn musical features without hand-crafted rules and produce diverse music pieces.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

The Beauty of Repetition in Machine Composition Scenarios

Liu

et al. 2022

Proceedings of the 30th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Repetition, a basic form of artistic creation, appears in most musical works and delivers enthralling aesthetic experiences. However, repetition remains underexplored in terms of automatic music composition. As an initial effort in repetition modelling, this paper focuses on generating motif-level repetitions via domain knowledge-based and example-based learning techniques. A novel repetition transformer (R-Transformer) that combines a Transformer encoder and a repetition-aware learner is trained on a new repetition dataset with 584,329 samples from different categories of motif repetition. The Transformer encoder learns the representation among music notes from the repetition dataset; the novel repetition-aware learner exploits repetitions' unique characteristics based on music theory. Experiments show that, with any given motif, R-Transformer can generate a large number of variable and beautiful repetitions. With ingenious fusion of these high-quality pieces, the musicality and appeal of machine-composed music have been greatly improved. CCS CONCEPTS• Applied computing → Sound and music computing.

show abstract

Section: Related Workmentioning

confidence: 99%

“…We adopt the most established Transformer encoder architecture [31], which has been frequently applied in music generation as of late [13,37]. We follow the standard Transformer encoder architecture [31] and illustrate it in Figure 2 (b).…”

Section: Transformer Encoder Architecturementioning

confidence: 99%

The Beauty of Repetition in Machine Composition Scenarios

Liu

et al. 2022

Proceedings of the 30th ACM International Conference on Multimedia

View full text Add to dashboard Cite

show abstract

“…REMI [36] is a common representation format that uses [bar] and [position] tokens to place tokens on a metrical grid that uniformly divides a bar into a certain number of positions and assumes symbolic timing. Based on this, Hsiao et al [58] further employed an expansion-compression trick to convert a piece of music to a sequence of compound words by grouping neighboring tokens, significantly shortening the token sequences. As a result, we convert MIDI to the compound word format as our input.…”

Section: A Midi To Compound Wordmentioning

confidence: 99%

“…As a result, we convert MIDI to the compound word format as our input. According to the practice of Hsiao et al [58], we convert MIDI into 7 types of symbols: Tempo, Chord, Bar-beat, Type, Pitch, Duration, and Velocity. For better illustration, we show part of melodies of Joe Hisaishi's "The Sun Also Rises" and transformed MIDI into compound words, as shown in Figure 2.…”

Section: A Midi To Compound Wordmentioning

confidence: 99%

“…Based on the pretrained audio to midi transcription model provided by Kong et al [61], mp3 is transcribed into MIDI format. Then, we performed MP3-MIDI synchronization, melody extraction, chord recognition, and quantization steps following [58] to obtain normalized compound word format. We perform gap identification for music sequence segmentation according to the time information given in the lyrics.…”

Section: Empirical Evaluation and Analysis A Setupmentioning

confidence: 99%

See 1 more Smart Citation

A Fuzzy Training Framework for Controllable Sequence-to-Sequence Generation

Wang

et al. 2022

IEEE Access

View full text Add to dashboard Cite

The generation of music lyrics by artificial intelligence (AI) is frequently modeled as a language-targeted sequence-to-sequence generation task. Formally, if we convert the melody into a word sequence, we can consider the lyrics generation task to be a machine translation task. Traditional machine translation tasks involve translating between cross-lingual word sequences, whereas music lyrics generation tasks involve translating between music and natural language word sequences. The theme or key words of the generated lyrics are usually limited to meet the needs of the users when they are generated. This requirement can be thought of as a restricted translation problem. In this paper, we propose a fuzzy training framework that allows a model to simultaneously support both unrestricted and restricted translation by adopting an additional auxiliary training process without constraining the decoding process. This maintains the benefits of restricted translation but greatly reduces the extra time overhead of constrained decoding, thus improving its practicality. The experimental results show that our framework is well suited to the music lyrics generation and restricted machine translation tasks, and that it can also generate language sequence under the condition of given restricted words without training multiple models, thereby achieving the goal of green AI.

show abstract