2017
DOI: 10.48550/arxiv.1709.04396
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Tutorial on Deep Learning for Music Information Retrieval

Keunwoo Choi,
György Fazekas,
Kyunghyun Cho
et al.

Abstract: Following their success in Computer Vision and other areas, deep learning techniques have recently become widely adopted in Music Information Retrieval (MIR) research. However, the majority of works aim to adopt and assess methods that have been shown to be effective in other domains, while there is still a great need for more original research focusing on music primarily and utilising musical knowledge and insight. The goal of this paper is to boost the interest of beginners by providing a comprehensive tutor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
35
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 21 publications
(35 citation statements)
references
References 70 publications
0
35
0
Order By: Relevance
“…N refers to the number of feature maps (which is set to 32 for all layers in this paper) while W refers to the weights matrix of the fully-connected output layer.) (3,3)) and (4, 4) Fully-connected layer (50) output (50) [7] and Kapre [8]. Total 224,242 tracks are used and split into train/validation/test sets, 201,672/12,633/28,537 tracks respectively.…”
Section: Experiments and Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…N refers to the number of feature maps (which is set to 32 for all layers in this paper) while W refers to the weights matrix of the fully-connected output layer.) (3,3)) and (4, 4) Fully-connected layer (50) output (50) [7] and Kapre [8]. Total 224,242 tracks are used and split into train/validation/test sets, 201,672/12,633/28,537 tracks respectively.…”
Section: Experiments and Discussionmentioning
confidence: 99%
“…As a result, decibel-scaled melspectrograms always outperform the linear versions as shown in Fig 5, where the same-coloured bars should be compared across within {1 vs. 2} and {1s vs. 2s}. Colours indicate normalization schemes while {1 vs. 1s} and {2 vs. 2s} compare effect, both of which are explained in Section 2.3 3 . Compared to the performance differences while controlling the training set size (the pink bar charts on the right of Figure 5) the additional work introduced by not using decibel scaling can be roughly estimated by comparing these scores to those networks when the training data size is limited (in pink).…”
Section: Log-compression Of Magnitudesmentioning
confidence: 91%
See 2 more Smart Citations
“…Computing the STFT to obtain features with high resolution in frequency leads to features with low resolution in time, and vise versa [6]. Most audio processing approaches prefer an auditory motivated frequency scale such as Mel, Bark, or Log scaling rather than a linear frequency scale [7], [8]. However, it is usually not easy to reconstruct the time-domain signals from those type of features.…”
Section: Introductionmentioning
confidence: 99%