2017
DOI: 10.48550/arxiv.1710.11153
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Onsets and Frames: Dual-Objective Piano Transcription

Abstract: We advance the state of the art in polyphonic piano music transcription by using a deep convolutional and recurrent neural network which is trained to jointly predict onsets and frames. Our model predicts pitch onset events and then uses those predictions to condition framewise pitch predictions. During inference, we restrict the predictions from the framewise detector by not allowing a new note to start unless the onset detector also agrees that an onset for that pitch is present in the frame. We focus on imp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
26
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 12 publications
(26 citation statements)
references
References 16 publications
0
26
0
Order By: Relevance
“…Indeed, for piano performance videos from Internet (usually without accompanied ground truth Midi). We retrieve the Pseudo ground truth (GT) Midi from audio with the Onset and Frames framework [25]. This allows us to avoid hardware constraints of the instrument and to use any video, even those recorded in an unconstrained setup.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Indeed, for piano performance videos from Internet (usually without accompanied ground truth Midi). We retrieve the Pseudo ground truth (GT) Midi from audio with the Onset and Frames framework [25]. This allows us to avoid hardware constraints of the instrument and to use any video, even those recorded in an unconstrained setup.…”
Section: Methodsmentioning
confidence: 99%
“…Such Midi is typically obtained with an electronic keyboard, a process that make creation of the training data to be limited. To overcome this challenge, the Onsets and Frames framework enables to transcript audio waveform to Midi [25]. A recent work used this framework to obtain Pseudo Ground truth Midi and implemented a ResNet [26], to predict the pitch onsets events (times and identities of keys being pressed) given video frames stream [27].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…As a result, there are a large number of transcription models whose success relies on hand-designed representations for piano transcription. For instance, the Onsets & Frames model (Hawthorne et al, 2017) uses dedicated outputs for detecting piano onsets and the note being played; Kelz et al (2019) represents the entire amplitude envelope of a piano note; and Kong et al (2020) additionally models piano foot pedal events (a piano-specific way of controlling a note's sustain). Single-instrument transcription models have also been developed for other instruments such as guitar (Xi et al, 2018) and drums (Cartwright & Bello, 2018;Callender et al, 2020), though these instruments have received less attention than piano.…”
Section: Music Transcriptionmentioning
confidence: 99%
“…Onset can expresses the beginning of music notes and it is the most basic expression form of music rhythm [18,12,2]. Beat is another form of rhythm, and there is a lot of work on beat detection [4,25,26].…”
Section: Introductionmentioning
confidence: 99%