2020
DOI: 10.1121/10.0001468
|View full text |Cite
|
Sign up to set email alerts
|

Polyphonic pitch tracking with deep layered learning

Abstract: This article presents a polyphonic pitch tracking system that is able to extract both framewise and note-based estimates from audio. The system uses several artificial neural networks trained individually in a deep layered learning setup. First, cascading networks are applied to a spectrogram for framewise fundamental frequency (f0) estimation. A sparse receptive field is learned by the first network and then used as a filter kernel for parameter sharing throughout the system. The f0 activations are connected … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(11 citation statements)
references
References 16 publications
0
11
0
Order By: Relevance
“…We finally want to stress again that all these models perform well, most of them superior to previously reported results: All models outperform our previous results [30] with a smaller network similar to CNN:XS, especially for SWD. For B10, our best Unet:XL reaches Acc=87.2% as compared to a traditional method [13] with Acc<70% and a deep-learning approach [48] with up to Acc=85.6%. On PhA, DRCNN:L and Unet:XL reach up to AP=87.2%, as compared to our own MCTC-based results of AP=82.8% [30].…”
Section: E Evaluation With a Mixed Datasetmentioning
confidence: 94%
“…We finally want to stress again that all these models perform well, most of them superior to previously reported results: All models outperform our previous results [30] with a smaller network similar to CNN:XS, especially for SWD. For B10, our best Unet:XL reaches Acc=87.2% as compared to a traditional method [13] with Acc<70% and a deep-learning approach [48] with up to Acc=85.6%. On PhA, DRCNN:L and Unet:XL reach up to AP=87.2%, as compared to our own MCTC-based results of AP=82.8% [30].…”
Section: E Evaluation With a Mixed Datasetmentioning
confidence: 94%
“…In a very thorough domain-specific treatment, Elowsson [23] constructs a hierarchical model that extracts fundamental frequency contours from spectrograms and uses these contours to infer note onsets and offsets. While it can be useful for many applications to have such intermediate representations as in Engel et al [24], in this work we treat polyphonic transcription from audio to discrete notes as an end-to-end problem.…”
Section: Related Work 21 Piano Transcriptionmentioning
confidence: 99%
“…Though varied, much of the cutting-edge research is reliant on machine learning (ML) techniques, not necessarily seeking to better understand the underlying structures present, and opting rather to maximize efficacy of the respective approaches (Figure 1). Elowsson (2020) proposed a method for MPE that relies on "deep layered learning" (Elowsson and Friberg 2014;Elowsson 2018). It uses a multi-stage system of neural networks and processing steps to elicit pitch contours -i.e.…”
Section: Related Workmentioning
confidence: 99%