Piano Transcription in the Studio Using an Extensible Alternating Directions Framework

Ewert, Sebastian; Sandler, Mark

doi:10.1109/taslp.2016.2593801

Cited by 18 publications

(36 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…those close to the decision boundary. In [3], the corresponding threshold amin was derived from user input as this threshold depends on the recording level. More precisely, the user is asked to provide an example of a note having the lowest intensity to be expected in a recording session (during the evaluation this was one value for the entire dataset and not specific to recordings).…”

Section: Thresholding Based On Glasberg-moore Modelmentioning

confidence: 99%

“…the activations for the onset part of each note pattern in P . The same representation was used in [3] for the final onset detection. We used LSTM networks in two different configurations.…”

Section: Lstm-based Decodingmentioning

confidence: 99%

“…However, for numerical reasons, the underlying parameter estimation process used in [10] was biased towards specific local minima of an objective function that are likely to cause misdetections. The design goal in [3] was thus to use a signal model similar to [10] but to replace the entire parameter estimation process. The resulting method consecutively switches from simple, convex regularizers (that stabilize the initial parameter estimation process) to more complex terms (to encourage a more meaningful structure as expressed by a graphical model).…”

Section: Introductionmentioning

confidence: 99%

“…However, such large, complex joint distributions have recently been successfully approximated using neural networks [18] [6]. Therefore, as a second extension, we investigate here combining the method proposed in [3], which is adaptable to new acoustical conditions with minimal effort, with long short term memory (LSTM) neural networks for decoding, which essentially provide a simple musical language model on top.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

An augmented lagrangian method for piano transcription using equal loudness thresholding and lstm-based decoding

Ewert

Sandler

2017

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Self Cite

View full text Add to dashboard Cite

A central goal in automatic music transcription is to detect individual note events in music recordings. An important variant is instrument-dependent music transcription where methods can use calibration data for the instruments in use. However, despite the additional information, results rarely exceed an f-measure of 80%. As a potential explanation, the transcription problem can be shown to be badly conditioned and thus relies on appropriate regularization. A recently proposed method employs a mixture of simple, convex regularizers (to stabilize the parameter estimation process) and more complex terms (to encourage more meaningful structure). In this paper, we present two extensions to this method. First, we integrate a computational loudness model to better differentiate real from spurious note detections. Second, we employ (Bidirectional) Long Short Term Memory networks to re-weight the likelihood of detected note constellations. Despite their simplicity, our two extensions lead to a drop of about 35% in note error rate compared to the state-of-the-art.

show abstract

Section: Thresholding Based On Glasberg-moore Modelmentioning

confidence: 99%

Section: Lstm-based Decodingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

An augmented lagrangian method for piano transcription using equal loudness thresholding and lstm-based decoding

Ewert

Sandler

2017

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the supervised NMF, templates were usually formed by the isolated notes of the specific piano to be transcribed. Ewert employed spectro-temporal patterns to model the temporal evolution in NMF [31]. Cheng proposed a method to model the attack and decay of notes, and all the templates were trained by a Disklavier piano [32].…”

Section: Introductionmentioning

confidence: 99%

A Two-Stage Approach to Note-Level Transcription of a Specific Piano

2017

View full text Add to dashboard Cite

This paper presents a two-stage transcription framework for a specific piano, which combines deep learning and spectrogram factorization techniques. In the first stage, two convolutional neural networks (CNNs) are adopted to recognize the notes of the piano preliminarily, and note verification for the specific individual is conducted in the second stage. The note recognition stage is independent of piano individual, in which one CNN is used to detect onsets and another is used to estimate the probabilities of pitches at each detected onset. Hence, candidate pitches at candidate onsets are obtained in the first stage. During the note verification, templates for the specific piano are generated to model the attack of note per pitch. Then, the spectrogram of the segment around candidate onset is factorized using attack templates of candidate pitches. In this way, not only the pitches are picked up by note activations, but the onsets are revised. Experiments show that CNN outperforms other types of neural networks in both onset detection and pitch estimation, and the combination of two CNNs yields better performance than a single CNN in note recognition. We also observe that note verification further improves the performance of transcription. In the transcription of a specific piano, the proposed system achieves 82% on note-wise F-measure, which outperforms the state-of-the-art.

show abstract

AI Music Mixing Systems

Moffat

2021

Handbook of Artificial Intelligence for Music

View full text Add to dashboard Cite

Piano Transcription in the Studio Using an Extensible Alternating Directions Framework

Cited by 18 publications

References 52 publications

An augmented lagrangian method for piano transcription using equal loudness thresholding and lstm-based decoding

An augmented lagrangian method for piano transcription using equal loudness thresholding and lstm-based decoding

A Two-Stage Approach to Note-Level Transcription of a Specific Piano

AI Music Mixing Systems

Contact Info

Product

Resources

About