Pop Music Transformer

Huang, Yu-Siang; Yang, Yi-Hsuan

doi:10.1145/3394171.3413671

Cited by 193 publications

(114 citation statements)

References 14 publications

Supporting

Mentioning

114

Contrasting

Order By: Relevance

“…Music generation has been widely addressed as a deep learning task ( Briot et al, 2017 ), in particular using LSTMs ( Sturm et al, 2016 ; Wu et al, 2019 ) and more recently transformers Huang et al (2018) . Music tagged with emotion has also been generated through long short-term memory networks (LSTMs) with logistic regression and used to generate music with sentiment ( Ferreira and Whitehead, 2019 ).…”

Section: Related Workmentioning

confidence: 99%

Before, Between, and After: Enriching Robot Communication Surrounding Collaborative Creative Activities

2021

View full text Add to dashboard Cite

Research in creative robotics continues to expand across all creative domains, including art, music and language. Creative robots are primarily designed to be task specific, with limited research into the implications of their design outside their core task. In the case of a musical robot, this includes when a human sees and interacts with the robot before and after the performance, as well as in between pieces. These non-musical interaction tasks such as the presence of a robot during musical equipment set up, play a key role in the human perception of the robot however have received only limited attention. In this paper, we describe a new audio system using emotional musical prosody, designed to match the creative process of a musical robot for use before, between and after musical performances. Our generation system relies on the creation of a custom dataset for musical prosody. This system is designed foremost to operate in real time and allow rapid generation and dialogue exchange between human and robot. For this reason, the system combines symbolic deep learning through a Conditional Convolution Variational Auto-encoder, with an emotion-tagged audio sampler. We then compare this to a SOTA text-to-speech system in our robotic platform, Shimon the marimba player.We conducted a between-groups study with 100 participants watching a musician interact for 30 s with Shimon. We were able to increase user ratings for the key creativity metrics; novelty and coherence, while maintaining ratings for expressivity across each implementation. Our results also indicated that by communicating in a form that relates to the robot’s core functionality, we can raise likeability and perceived intelligence, while not altering animacy or anthropomorphism. These findings indicate the variation that can occur in the perception of a robot based on interactions surrounding a performance, such as initial meetings and spaces between pieces, in addition to the core creative algorithms.

show abstract

Section: Related Workmentioning

confidence: 99%

Before, Between, and After: Enriching Robot Communication Surrounding Collaborative Creative Activities

2021

View full text Add to dashboard Cite

show abstract

“…A typical use of the Transformer architecture in NLP is to encode the meaning of a word given the surrounding words, sentences, and paragraphs. Beyond NLP, other example uses of the Transformer architecture are found in music generation 43 , image generation 44 , image and video restoration [45][46][47][48][49] , game playing agents 50,51 , and drug discovery 52,53 . In this work, we explore how our attention-based architecture, CrabNet, performs in predicting materials properties relative to the common modeling techniques Roost, ElemNet, and random forest (RF) for regression-type problems.…”

Section: Introductionmentioning

confidence: 99%

Compositionally restricted attention-based network for materials property predictions

et al. 2021

View full text Add to dashboard Cite

In this paper, we demonstrate an application of the Transformer self-attention mechanism in the context of materials science. Our network, the Compositionally Restricted Attention-Based network (), explores the area of structure-agnostic materials property predictions when only a chemical formula is provided. Our results show that ’s performance matches or exceeds current best-practice methods on nearly all of 28 total benchmark datasets. We also demonstrate how ’s architecture lends itself towards model interpretability by showing different visualization approaches that are made possible by its design. We feel confident that and its attention-based framework will be of keen interest to future materials informatics researchers.

show abstract

“…The NDT is based on the BERT encoder [5] with modifications for application to neuroscientific datasets, specifically multi-electrode spiking activity. Modifications are needed as spiking activity has markedly different statistics than both language data and other time series [8,35] previously modeled by Transformers. Further, neuroscientific datasets are generally much smaller than typical dataset sizes in other machine learning domains, necessitating careful training decisions [9] 1 .…”

Section: Introductionmentioning

confidence: 99%

Representation learning for neural population activity with Neural Data Transformers

Pandarinath

2021

Preprint

View full text Add to dashboard Cite

Neural population activity is theorized to reflect an underlying dynamical structure. This structure can be accurately captured using state space models with explicit dynamics, such as those based on recurrent neural networks (RNNs). However, using recurrence to explicitly model dynamics necessitates sequential processing of data, slowing real-time applications such as brain-computer interfaces. Here we introduce the Neural Data Transformer (NDT), a non-recurrent alternative. We test the NDT’s ability to capture autonomous dynamical systems by applying it to synthetic datasets with known dynamics and data from monkey motor cortex during a reaching task well-modeled by RNNs. The NDT models these datasets as well as state-of-the-art recurrent models. Further, its non-recurrence enables 3.9ms inference, well within the loop time of real-time applications and more than 6 times faster than recurrent baselines on the monkey reaching dataset. These results suggest that an explicit dynamics model is not necessary to model autonomous neural population dynamics.Codegithub.com/snel-repo/neural-data-transformers.

show abstract

Pop Music Transformer

Cited by 193 publications

References 14 publications

Before, Between, and After: Enriching Robot Communication Surrounding Collaborative Creative Activities

Before, Between, and After: Enriching Robot Communication Surrounding Collaborative Creative Activities

Compositionally restricted attention-based network for materials property predictions

Representation learning for neural population activity with Neural Data Transformers

Contact Info

Product

Resources

About