Recent research on piano transcription has focused primarily on note events. Very few studies have investigated pedalling techniques, which form an important aspect of expressive piano music performance. In this paper, we propose a novel method for piano sustain-pedal detection based on Convolutional Neural Networks (CNN). Inspired by different acoustic characteristics at the start (pedal onset) versus during the pedalled segment, two binary classifiers are trained separately to learn both temporal dependencies and timbral features using CNN. Their outputs are fused in order to decide whether a portion in a piano recording is played with the sustain pedal. The proposed architecture and our detection system are assessed using a dataset with frame-wise pedal on/off annotations. An average F1 score of 0.74 is obtained for the test set. The method performs better on pieces of Romantic-era composers, who intended to deliver more colours to the piano sound through pedalling techniques.
Personalized recommendation on new track releases has always been a challenging problem in the music industry. To combat this problem, we first explore user listening history and demographics to construct a user embedding representing the user's music preference. With the user embedding and audio data from user's liked and disliked tracks, an audio embedding can be obtained for each track using metric learning with Siamese networks. For a new track, we can decide the best group of users to recommend by computing the similarity between the track's audio embedding and different user embeddings, respectively. The proposed system yields state-of-the-art performance on content-based music recommendation tested with millions of users and tracks. Also, we extract audio embeddings as features for music genre classification tasks. The results show the generalization ability of our audio embeddings.
This paper presents a study of piano pedaling gestures and techniques on the sustain pedal from the perspective of measurement, recognition, and visualization. Pedaling gestures can be captured by a dedicated measurement system where the sensor data can be simultaneously recorded alongside the piano sound under normal playing conditions. Using the sensor data collected from the system, the recognition is comprised of two separate tasks: pedal onset/offset detection and classification by technique. The onset and offset times of each pedaling technique were computed using signal processing algorithms. Based on features extracted from every segment when the pedal is pressed, the task of classifying the segments by pedaling technique was undertaken using machine learning methods. We compared Support Vector Machines (SVM) and hidden Markov models (HMM) for this task. Our system achieves high accuracies, over 0.7 F1 score for all techniques and over 0.9 on average. The recognition results can be represented using novel pedaling notations and visualized in an audio-based score following application.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.