Music generation has generally been focused on either creating scores or interpreting them. We discuss differences between these two problems and propose that, in fact, it may be valuable to work in the space of direct performance generation: jointly predicting the notes and also their expressive timing and dynamics. We consider the significance and qualities of the data set needed for this. Having identified both a problem domain and characteristics of an appropriate data set, we show an LSTM-based recurrent network model that subjectively performs quite well on this task. Critically, we provide generated examples. We also include feedback from professional composers and musicians about some of these examples.Recognizing that "talking about music is like dancing about architecture" 1 , we kindly ask the reader to listen to the linked audio in order to effectively understand the motivation, data, results, and conclusions of this paper. As this research is ultimately about producing music, we believe the actual results are most effectively perceived-indeed, only perceived-in the audio domain. This will provide necessary context for the verbal descriptions in the rest of the paper.
We show how a neural network can be used to allow a mobile robot to derive an accurate estimate of its location from noisy sonar sensors and noisy motion information. The robot's model of its location is in the form of a probability distribution across a grid of possible locations. This distribution is updated using both the motion information and the predictions of a neural network that maps locations into likelihood distributions across possible sonar readings. By predicting sonar readings from locations, rather than vice versa, the robot can handle the very nongaussian noise in the sonar sensors. By using the constraint provided by the noisy motion information, the robot can use previous readings to improve its estimate of its current location. By treating the resulting estimates as if they were correct, the robot can learn the relationship between location and sonar readings without requiring an external supervision signal that specifies the actual location of the robot. It can learn to locate itself in a new environment with almost no supervision, and it can maintain its location ability even when the environment is nonstationary.
We advance the state of the art in polyphonic piano music transcription by using a deep convolutional and recurrent neural network which is trained to jointly predict onsets and frames. Our model predicts pitch onset events and then uses those predictions to condition framewise pitch predictions. During inference, we restrict the predictions from the framewise detector by not allowing a new note to start unless the onset detector also agrees that an onset for that pitch is present in the frame. We focus on improving onsets and offsets together instead of either in isolation as we believe this correlates better with human musical perception. Our approach results in over a 100% relative improvement in note F1 score (with offsets) on the MAPS dataset. Furthermore, we extend the model to predict relative velocities of normalized audio which results in more natural-sounding transcriptions.Equal contribution. ‡ Work done as a Google Brain intern.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.