2017
DOI: 10.3390/app7090901
|View full text |Cite
|
Sign up to set email alerts
|

A Two-Stage Approach to Note-Level Transcription of a Specific Piano

Abstract: This paper presents a two-stage transcription framework for a specific piano, which combines deep learning and spectrogram factorization techniques. In the first stage, two convolutional neural networks (CNNs) are adopted to recognize the notes of the piano preliminarily, and note verification for the specific individual is conducted in the second stage. The note recognition stage is independent of piano individual, in which one CNN is used to detect onsets and another is used to estimate the probabilities of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
10
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(10 citation statements)
references
References 27 publications
0
10
0
Order By: Relevance
“…Various models such as support vector machines (SVM) [5,6], restricted Boltzmann machines (RBM) [4], long-short term memory neural networks [7], and convolutional neural networks (CNN) [8,9] have been developed to tackle this task. For example, Wang et al [10] integrate non-negative matrix factorization (NMF) with a CNN in order to improve transcription accuracy. Hawthorne et al [11] split the AMT into three sub-tasks: onset detection, frame activation, and velocity estimation, which allows them to achieve state-of-the art transcription accuracy on piano music.…”
Section: Introductionmentioning
confidence: 99%
“…Various models such as support vector machines (SVM) [5,6], restricted Boltzmann machines (RBM) [4], long-short term memory neural networks [7], and convolutional neural networks (CNN) [8,9] have been developed to tackle this task. For example, Wang et al [10] integrate non-negative matrix factorization (NMF) with a CNN in order to improve transcription accuracy. Hawthorne et al [11] split the AMT into three sub-tasks: onset detection, frame activation, and velocity estimation, which allows them to achieve state-of-the art transcription accuracy on piano music.…”
Section: Introductionmentioning
confidence: 99%
“…This onset-aware model significantly reduced note-level false positive errors, which is critical in perceptual evaluation of the transcription. Similar multi-state note modeling approaches are found in [4,[7][8][9][10] and some detect even more phases of note envelope including onset, sustain and offset [11,12]. As such, various versions of note state representations have been suggested so far and showed improved performances.…”
Section: Introductionmentioning
confidence: 69%
“…Most of recent approaches in polyphonic piano transcription are based on deep learning. The model architectures are diverse, including CNN [2,3,10,12], RNN [9,15], CRNN [4,5,16], and U-Net [17]. The loss function is typically the cross-entropy between predicted and ground truth labels but also includes the adversarial loss [5].…”
Section: Multi-state Note Modelingmentioning
confidence: 99%
See 1 more Smart Citation
“…Comprehensive experimental comparison of DNN-and NMF-based ADT methods have been reported in [3]. Convolutional neural networks (CNNs), for example, have been used for extracting local time-frequency features from an input spectrogram [11][12][13][14][15]. Recurrent neural networks (RNNs) are expected to learn the temporal dynamics inherent in music and have successfully been used, often in combination with CNNs, for estimating the smooth onset probabilities of drum sounds at the frame level [16][17][18].…”
Section: Introductionmentioning
confidence: 99%