Cutting Music Source Separation Some Slakh: A Dataset to Study the Impact of Training Data Quality and Quantity

Manilow, Ethan; Wichern, Gordon; Seetharaman, Prem; Roux, Jonathan Le

doi:10.1109/waspaa.2019.8937170

Cited by 71 publications

(40 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The dataset used in this work is the Slakh2100 dataset [5]. The MIDI files are aligned with the audio files and are used as the grund truth picth and duration of the notes.…”

Section: Resultsmentioning

confidence: 99%

Timbre Comparison in Note Tracking from Onset, Frames and Pitch Estimation

Oliván¹,

Pinilla²,

Blázquez³

2020

Jorn. jovenes investig. I3A

View full text Add to dashboard Cite

Note Tracking (NT) is a subtask of Automatic Music Transcription (AMT) which is a critical problem in the field of Music Information Retrieval (MIR). The aim of this work is to compare the performance of two models, one for onsets and frames prediction and another one with pitch detection and a note tracking algorithm in order to study the behaviour of different timbres and families of instruments in note tracking subtasks.

show abstract

“…The dataset used in this work is the Slakh2100 dataset [5]. The MIDI files are aligned with the audio files and are used as the grund truth picth and duration of the notes.…”

Section: Resultsmentioning

confidence: 99%

Timbre Comparison in Note Tracking from Onset, Frames and Pitch Estimation

Oliván¹,

Pinilla²,

Blázquez³

2020

Jorn. jovenes investig. I3A

View full text Add to dashboard Cite

show abstract

“…We test the f 0 detection and note tracking algorithm (Algorithm 1) in the Slakh2100 dataset [31] which provides MIDI files and their synthesized audio files, and we compare the performance of this approach with the OaF model [4]. It is important to note that the tests have been performed on the same dataset although the OaF model has only been trained with the MAESTRO dataset [17] only for piano transcription.…”

Section: Datasetsmentioning

confidence: 99%

“…For our second method, we train the CNN model for polyphonic transcription with the Slakh2100 dataset [31]. It contains 145 h of mixtures and a total of 2100 automatically mixed tracks and their corresponding MIDI files synthesized.…”

Section: Datasetsmentioning

confidence: 99%

A Comparison of Deep Learning Methods for Timbre Analysis in Polyphonic Automatic Music Transcription

et al. 2021

View full text Add to dashboard Cite

Automatic music transcription (AMT) is a critical problem in the field of music information retrieval (MIR). When AMT is faced with deep neural networks, the variety of timbres of different instruments can be an issue that has not been studied in depth yet. The goal of this work is to address AMT transcription by analyzing how timbre affect monophonic transcription in a first approach based on the CREPE neural network and then to improve the results by performing polyphonic music transcription with different timbres with a second approach based on the Deep Salience model that performs polyphonic transcription based on the Constant-Q Transform. The results of the first method show that the timbre and envelope of the onsets have a high impact on the AMT results and the second method shows that the developed model is less dependent on the strength of the onsets than other state-of-the-art models that deal with AMT on piano sounds such as Google Magenta Onset and Frames (OaF). Our polyphonic transcription model for non-piano instruments outperforms the state-of-the-art model, such as for bass instruments, which has an F-score of 0.9516 versus 0.7102. In our latest experiment we also show how adding an onset detector to our model can outperform the results given in this work.

show abstract

“…We used the Slakh2100-split2 (Slakh) [62] and the RWC Popular Music Database (RWC) [63] for evaluation because these datasets include ground-truth beat times. The Slakh dataset contains 2100 musical pieces in which the audio signals were synthesized from the Lakh MIDI dataset [64] using professional-grade virtual instruments, and the RWC dataset contains 100 Japanese popular songs.…”

Section: Evaluation Datamentioning

confidence: 99%

Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms

Ishizuka

Nishikimi

Yoshii

2021

Signals

View full text Add to dashboard Cite

This paper describes an automatic drum transcription (ADT) method that directly estimates a tatum-level drum score from a music signal in contrast to most conventional ADT methods that estimate the frame-level onset probabilities of drums. To estimate a tatum-level score, we propose a deep transcription model that consists of a frame-level encoder for extracting the latent features from a music signal and a tatum-level decoder for estimating a drum score from the latent features pooled at the tatum level. To capture the global repetitive structure of drum scores, which is difficult to learn with a recurrent neural network (RNN), we introduce a self-attention mechanism with tatum-synchronous positional encoding into the decoder. To mitigate the difficulty of training the self-attention-based model from an insufficient amount of paired data and to improve the musical naturalness of the estimated scores, we propose a regularized training method that uses a global structure-aware masked language (score) model with a self-attention mechanism pretrained from an extensive collection of drum scores. The experimental results showed that the proposed regularized model outperformed the conventional RNN-based model in terms of the tatum-level error rate and the frame-level F-measure, even when only a limited amount of paired data was available so that the non-regularized model underperformed the RNN-based model.

show abstract

Cutting Music Source Separation Some Slakh: A Dataset to Study the Impact of Training Data Quality and Quantity

Cited by 71 publications

References 16 publications

Timbre Comparison in Note Tracking from Onset, Frames and Pitch Estimation

Timbre Comparison in Note Tracking from Onset, Frames and Pitch Estimation

A Comparison of Deep Learning Methods for Timbre Analysis in Polyphonic Automatic Music Transcription

Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms

Contact Info

Product

Resources

About