MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding

Chou, Yi-Hui; Chen, I‐Chun; Chang, Chin-Jui; Ching, Joann; Yang, Yi-Hsuan

doi:10.48550/arxiv.2107.05223

Cited by 8 publications

(22 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Firstly, music grammar is learned in the pretraining stage and then specific tasks are learned in the finetuning stage. This process is similar to the work of [8,33,6].…”

Section: Model Architecturementioning

confidence: 66%

MusIAC: An extensible generative framework for Music Infilling Applications with multi-level Control

Guo¹,

Simpson²,

Kiefer³

et al. 2022

Preprint

View full text Add to dashboard Cite

We present a novel music generation framework for music infilling, with a user friendly interface. Infilling refers to the task of generating musical sections given the surrounding multi-track music. The proposed transformer-based framework is extensible for new control tokens as the added music control tokens such as tonal tension per bar and track polyphony level in this work. We explore the effects of including several musically meaningful control tokens, and evaluate the results using objective metrics related to pitch and rhythm. Our results demonstrate that adding additional control tokens helps to generate music with stronger stylistic similarities to the original music. It also provides the user with more control to change properties like the music texture and tonal tension in each bar compared to previous research which only provided control for track density. We present the model in a Google Colab notebook to enable interactive generation.

show abstract

“…Firstly, music grammar is learned in the pretraining stage and then specific tasks are learned in the finetuning stage. This process is similar to the work of [8,33,6].…”

Section: Model Architecturementioning

confidence: 66%

MusIAC: An extensible generative framework for Music Infilling Applications with multi-level Control

Guo¹,

Simpson²,

Kiefer³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…This strategy improves the efficiency of Transformer-based [42] architectures due to the decreased input sequence length which reduces the computational complexity. Recent studies show that CP achieves a better output quality compared to the aforementioned representations in certain tasks such as conditional/unconditional piano generation [18,8] and emotion recognition [21].…”

Section: Data Encoding Representationmentioning

confidence: 99%

Conditional Drums Generation using Compound Word Representations

Makris¹,

Guo²,

Καλιακάτσος-Παπακώστας³

et al. 2022

Preprint

View full text Add to dashboard Cite

The field of automatic music composition has seen great progress in recent years, specifically with the invention of transformerbased architectures. When using any deep learning model which considers music as a sequence of events with multiple complex dependencies, the selection of a proper data representation is crucial. In this paper, we tackle the task of conditional drums generation using a novel data encoding scheme inspired by the Compound Word representation, a tokenization process of sequential data. Therefore, we present a sequence-to-sequence architecture where a Bidirectional Long short-term memory (BiLSTM) Encoder receives information about the conditioning parameters (i.e., accompanying tracks and musical attributes), while a Transformer-based Decoder with relative global attention produces the generated drum sequences. We conducted experiments to thoroughly compare the effectiveness of our method to several baselines. Quantitative evaluation shows that our model is able to generate drums sequences that have similar statistical distributions and characteristics to the training corpus. These features include syncopation, compression ratio, and symmetry among others. We also verified, through a listening test, that generated drum sequences sound pleasant, natural and coherent while they "groove" with the given accompaniment.

show abstract

“…This approach offers simplicity and efficiency. Drawing inspiration from the remarkable achievements of BERT, Chou et al [5] introduced MidiBERTPiano, a large-scale pre-trained model utilizing CP representation. The proposed model showcases promising outcomes in various domains, including symbolic music emotion recognition.…”

Section: Mer With Symbolic-onlymentioning

confidence: 99%

“…Existing researches mainly apply deep-learning-based methods on the acoustic domain or uses sequencemodeling methods on the symbolic domain representations of the music. In their recent publication on emotion recognition in symbolic music, Qiu et al [31] introduced a pioneering approach utilizing the MIDIBERT model [4], a large-scale pre-trained music understanding model. At present, no existing research on Music Emotion Recognition (MER) for instrumental music integrates both acoustic and symbolic analyses.…”

Section: Introductionmentioning

confidence: 99%

Beijing Qihoo Technology Co., Ltd. v. Tencent Technology (Shenzhen) Co., Ltd. and Shenzhen Tencent Computer System Co., Ltd. (Dispute over the Abuse of Market Dominant Position)—Analysis Methods and Ideas for the Definition of the Relevant Markets and the Abuse of Market Dominant Position in the Internet Environment

Zhu

2019

Library of Selected Cases From the Chinese Court

View full text Add to dashboard Cite

Natural language processing models based on neural networks are vulnerable to adversarial examples. These adversarial examples are imperceptible to human readers but can mislead models to make the wrong predictions. In a black-box setting, attacker can fool the model without knowing model's parameters and architecture. Previous works on word-level attacks widely use single semantic space and greedy search as a search strategy. However, these methods fail to balance the attack success rate, quality of adversarial examples and time consumption. In this paper, we propose BeamAttack, a textual attack algorithm that makes use of mixed semantic spaces and improved beam search to craft high-quality adversarial examples. Extensive experiments demonstrate that BeamAttack can improve attack success rate while saving numerous queries and time, e.g., improving at most 7% attack success rate than greedy search when attacking the examples from MR dataset. Compared with heuristic search, BeamAttack can save at most 85% model queries and achieve a competitive attack success rate. The adversarial examples crafted by BeamAttack are highly transferable and can effectively improve model's robustness during adversarial training. Code is available at https://github.com/zhuhai-ustc/beamattack/tree/master

show abstract

MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding

Cited by 8 publications

References 48 publications

MusIAC: An extensible generative framework for Music Infilling Applications with multi-level Control

MusIAC: An extensible generative framework for Music Infilling Applications with multi-level Control

Conditional Drums Generation using Compound Word Representations

Contact Info

Product

Resources

About