2021
DOI: 10.48550/arxiv.2107.05223
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding

Abstract: This paper presents an attempt to employ the mask language modeling approach of BERT to pre-train a 12-layer Transformer model over 4,166 pieces of polyphonic piano MIDI files for tackling a number of symbolic-domain discriminative music understanding tasks. These include two note-level classification tasks, i.e., melody extraction and velocity prediction, as well as two sequence-level classification tasks, i.e., composer classification and emotion classification. We find that, given a pretrained Transformer, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(22 citation statements)
references
References 48 publications
0
21
0
Order By: Relevance
“…Firstly, music grammar is learned in the pretraining stage and then specific tasks are learned in the finetuning stage. This process is similar to the work of [8,33,6].…”
Section: Model Architecturementioning
confidence: 66%
“…Firstly, music grammar is learned in the pretraining stage and then specific tasks are learned in the finetuning stage. This process is similar to the work of [8,33,6].…”
Section: Model Architecturementioning
confidence: 66%
“…This strategy improves the efficiency of Transformer-based [42] architectures due to the decreased input sequence length which reduces the computational complexity. Recent studies show that CP achieves a better output quality compared to the aforementioned representations in certain tasks such as conditional/unconditional piano generation [18,8] and emotion recognition [21].…”
Section: Data Encoding Representationmentioning
confidence: 99%
“…This approach offers simplicity and efficiency. Drawing inspiration from the remarkable achievements of BERT, Chou et al [5] introduced MidiBERTPiano, a large-scale pre-trained model utilizing CP representation. The proposed model showcases promising outcomes in various domains, including symbolic music emotion recognition.…”
Section: Mer With Symbolic-onlymentioning
confidence: 99%
“…Existing researches mainly apply deep-learning-based methods on the acoustic domain or uses sequencemodeling methods on the symbolic domain representations of the music. In their recent publication on emotion recognition in symbolic music, Qiu et al [31] introduced a pioneering approach utilizing the MIDIBERT model [4], a large-scale pre-trained music understanding model. At present, no existing research on Music Emotion Recognition (MER) for instrumental music integrates both acoustic and symbolic analyses.…”
Section: Introductionmentioning
confidence: 99%