2018
DOI: 10.3390/app8040606
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End Neural Optical Music Recognition of Monophonic Scores

Abstract: Optical Music Recognition is a field of research that investigates how to computationally decode music notation from images. Despite the efforts made so far, there are hardly any complete solutions to the problem. In this work, we study the use of neural networks that work in an end-to-end manner. This is achieved by using a neural model that combines the capabilities of convolutional neural networks, which work on the input image, and recurrent neural networks, which deal with the sequential nature of the pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
79
0
4

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 64 publications
(83 citation statements)
references
References 40 publications
0
79
0
4
Order By: Relevance
“…We use the Symbol Error Rate (SER) [17,18,19] metric. Similarly to Word Error Rate (WER) [28], commonly used in text recognition community, SER is computed as the Levenshtein distance: the sum of edit operations that are needed to convert the output of our method into the groundtruth in terms of symbol insertions (I), substitutions (S) and deletions (D).…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…We use the Symbol Error Rate (SER) [17,18,19] metric. Similarly to Word Error Rate (WER) [28], commonly used in text recognition community, SER is computed as the Levenshtein distance: the sum of edit operations that are needed to convert the output of our method into the groundtruth in terms of symbol insertions (I), substitutions (S) and deletions (D).…”
Section: Discussionmentioning
confidence: 99%
“…For example, Van der Wel et al [17] use Convolutional Neural Networks (CNNs) and sequenceto-sequence (seq2seq) models for recognizing monophonic printed music scores. Calvo-Zaragoza et al [18,19] also use a CNN to extract features from printed music scores and feed a Recurrent Neural Network. To avoid the alignment between the music score and the ground-truth data, they use the Connectionist Temporal Classification (CTC) loss function commonly used in speech and text recognition.…”
Section: Deep Learning-based Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…The authors of [3], following the work of [17], observe that it does not make sense to apply directly the standard Unix diff utility to XML score files. A possible solution is to extract a linear representation of the graphical content [6], but motivating by the hierarchical structure of note beaming and tuplet grouping we chose to follow another approach and compare scores in terms of hierarchical structure, by using a tree-edit distance based on tree nodes operations, as proposed by [27] or [7].…”
Section: Figurementioning
confidence: 99%
“…At a detailed level, it is very valuable for musicologist and developers of version control systems to get precise clues on the locations of the differences between scores (e.g., between two editions of the same score). One difficulty that immediately arises for defining a diff tool for music scores is that, due to the nature/complexity of the music language, a music score contains multiple levels [6,10] that can be compared.…”
Section: Introductionmentioning
confidence: 99%