2021
DOI: 10.3390/signals2030031
|View full text |Cite
|
Sign up to set email alerts
|

Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms

Abstract: This paper describes an automatic drum transcription (ADT) method that directly estimates a tatum-level drum score from a music signal in contrast to most conventional ADT methods that estimate the frame-level onset probabilities of drums. To estimate a tatum-level score, we propose a deep transcription model that consists of a frame-level encoder for extracting the latent features from a music signal and a tatum-level decoder for estimating a drum score from the latent features pooled at the tatum level. To c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
11
1

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(13 citation statements)
references
References 39 publications
(42 reference statements)
1
11
1
Order By: Relevance
“…Due to their success, CNNs and RNNs are used together in an architecture named CRNN to model both acoustic and sequential features [2,3,5,9,10,12]. Recently, however, RNNs have been more commonly replaced by self-attention mechanisms [7], since this technique offers parallel computation and better performance when sufficient data are provided [4,6,16]. Finally, the learning of long-term sequential features is also performed with the help of an extra model, external to the transcription model, known as a language model [5,6].…”
Section: Architecturementioning
confidence: 99%
See 4 more Smart Citations
“…Due to their success, CNNs and RNNs are used together in an architecture named CRNN to model both acoustic and sequential features [2,3,5,9,10,12]. Recently, however, RNNs have been more commonly replaced by self-attention mechanisms [7], since this technique offers parallel computation and better performance when sufficient data are provided [4,6,16]. Finally, the learning of long-term sequential features is also performed with the help of an extra model, external to the transcription model, known as a language model [5,6].…”
Section: Architecturementioning
confidence: 99%
“…Recently, however, RNNs have been more commonly replaced by self-attention mechanisms [7], since this technique offers parallel computation and better performance when sufficient data are provided [4,6,16]. Finally, the learning of long-term sequential features is also performed with the help of an extra model, external to the transcription model, known as a language model [5,6]. This model is meant to leverage symbolic data only (which are much more abundant than data from annotated audio) and is trained exclusively on them.…”
Section: Architecturementioning
confidence: 99%
See 3 more Smart Citations