Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3414032
|View full text |Cite
|
Sign up to set email alerts
|

PiRhDy: Learning Pitch-, Rhythm-, and Dynamics-aware Embeddings for Symbolic Music

Abstract: Definitive embeddings remain a fundamental challenge of computational musicology for symbolic music in deep learning today. Analogous to natural language, music can be modeled as a sequence of tokens. This motivates the majority of existing solutions to explore the utilization of word embedding models to build music embeddings. However, music differs from natural languages in two key aspects: (1) musical token is multi-faceted-it comprises of pitch, rhythm and dynamics information; and (2) musical context is t… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
32
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 36 publications
(32 citation statements)
references
References 19 publications
0
32
0
Order By: Relevance
“…Additionally, it would be interesting to complete on some of the systems studied in this paper that are concerned with automating one single musical task to turn them into full composers. For example, rhythm patterns learned in [37] can be embedded within the process of generating further complete music compositions. In this survey we already highlighted the merge of different CI techniques together such as rule-based with GA and AIS, and such as CBR with Markov chains.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Additionally, it would be interesting to complete on some of the systems studied in this paper that are concerned with automating one single musical task to turn them into full composers. For example, rhythm patterns learned in [37] can be embedded within the process of generating further complete music compositions. In this survey we already highlighted the merge of different CI techniques together such as rule-based with GA and AIS, and such as CBR with Markov chains.…”
Section: Discussionmentioning
confidence: 99%
“…The extracted patterns then serve as the basis for chromosome representation. Hongru Liang et al [37] developed a rhythm learning model based on ANNs. Table. 1 summarizes the frequently used CI techniques for automating each music composition task.…”
Section: Rhythmmentioning
confidence: 99%
“…Huang et al (2016); Madjiheurem et al (2016) regard chords as words in NLP and learn chords representations using the word2vec model. Herremans and Chuan (2017); Chuan et al (2020); Liang et al (2020) divide music pieces into non-overlapping music slices with a fixed duration and train the embeddings for each slice. Hirai and Sawada (2019) cluster musical notes into groups and regard such groups as words for representation learning.…”
Section: Symbolic Music Understandingmentioning
confidence: 99%
“…Similar to natural language, music is usually represented in symbolic data format (e.g., MIDI) (Jackendoff, 2009;McMullen and Saffran, 2004) with sequential tokens, and some methods (Mikolov et al, 2013a,b) from NLP can be adopted for symbolic music understanding. Since the labeled training data for each music understanding task is usually scarce, previous works (Liang et al, 2020;Chuan et al, 2020) leverage unlabeled music data to learn music token embeddings, similar to word embeddings in natural language tasks. Unfortunately, due to their shallow structures and limited unlabeled data, such embedding-based approaches have limited capability to learn powerful music representations.…”
Section: Introductionmentioning
confidence: 99%
“…Robust Training: Robust training has shown to be effective to improve the robustness of the models in computer vision (Szegedy et al, 2013). In Natural Language Processing, it involves augmenting the training data with carefully crafted noisy examples: semantically equivalent word substitu-tions (Alzantot et al, 2018), paraphrasing (Iyyer et al, 2018;Ribeiro et al, 2018), character-level noise (Ebrahimi et al, 2018b;Tan et al, 2020a,b), or perturbations at embedding space (Miyato et al, 2016;Liang et al, 2020). Inspired by Lei et al (2017) that nicely captures the semantic interactions in discourse relation, we regard noise as a disruptor to break semantic interactions and propose our CER approach to mitigate this phenomenon.…”
Section: Related Workmentioning
confidence: 99%