Proceedings of the 29th ACM International Conference on Multimedia 2021
DOI: 10.1145/3474085.3475405
|View full text |Cite
|
Sign up to set email alerts
|

ReconVAT: A Semi-Supervised Automatic Music Transcription Framework for Low-Resource Real-World Data

Abstract: Most of the current supervised automatic music transcription (AMT) models lack the ability to generalize. This means that they have trouble transcribing real-world music recordings from diverse musical genres that are not presented in the labelled training data. In this paper, we propose a semi-supervised framework, ReconVAT, which solves this issue by leveraging the huge amount of available unlabelled music recordings. The proposed ReconVAT uses reconstruction loss and virtual adversarial training. When combi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 20 publications
(15 citation statements)
references
References 30 publications
0
15
0
Order By: Relevance
“…Such score-audio pairs of classical music were considered to generate the MusicNet (MuN) dataset [18], an excellent resource of 330 license-free audio files with pre-synchronized pitch annotations that crucially stimulated the advancement of MPE [19]- [21]. 1 MuN is a very challenging dataset for MPE, yielding consistently and substantially worse results (F-measure below 75% [20]) across all studies compared to piano datasets such as MAPS or MAESTRO (up to 90% Fmeasure [7]).…”
Section: B Multi-pitch Datasetsmentioning
confidence: 99%
See 4 more Smart Citations
“…Such score-audio pairs of classical music were considered to generate the MusicNet (MuN) dataset [18], an excellent resource of 330 license-free audio files with pre-synchronized pitch annotations that crucially stimulated the advancement of MPE [19]- [21]. 1 MuN is a very challenging dataset for MPE, yielding consistently and substantially worse results (F-measure below 75% [20]) across all studies compared to piano datasets such as MAPS or MAESTRO (up to 90% Fmeasure [7]).…”
Section: B Multi-pitch Datasetsmentioning
confidence: 99%
“…As a major advancement, U-net models [35] were shown to improve performance of AMT [3], [20]- [23], [36] and other MIR tasks [37], [38]. More recently, the inclusion of self-attention components into U-nets [3], [21] and other models [39] was applied successfully to MPE. Most of our architectures rely on the U-net paradigm, enhanced with self-attention components as in [3] and other extensions.…”
Section: A Previous Modelsmentioning
confidence: 99%
See 3 more Smart Citations