2011
DOI: 10.1109/tasl.2011.2134092
|View full text |Cite
|
Sign up to set email alerts
|

A Conditional Random Field Framework for Robust and Scalable Audio-to-Score Matching

Abstract: Abstract-In the present work, we introduce the use of Conditional Random Fields (CRFs) for the audio-to-score alignment task. This framework encompasses the statistical models which are used in the literature and allows for more flexible dependency structures. In particular, it allows observation functions to be computed from several analysis frames.Three different CRF models are proposed for our task, for different choices of tradeoff between accuracy and complexity. Three types of features are used, characte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
46
0

Year Published

2012
2012
2018
2018

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 58 publications
(46 citation statements)
references
References 23 publications
0
46
0
Order By: Relevance
“…Automatic music transcription converts audio files to symbolic representations such as sheet music [3]; audio fingerprinting recognises specific audio recordings, for example in Shazam [32] or in query-by-humming [27]. Turning to live performance, automatic score following (audio-to-score) automatically synchronises live audio or MIDI input to a pre-composed score [15] so as to control computer-generated accompaniment, digital effects or trigger extra-musical events such as lighting and visuals [14]. More flexibly, Cypher [26] analyses MIDI data in real-time, extracting key, chord, beat and phrase group features so as to generate musical accompaniment while the Analyser plugin [29] for Digital Audio Workstations extracts real-time audio features and maps them to Open Sound Control (OSC) to control live visuals.…”
Section: Music Recognition Technologiesmentioning
confidence: 99%
“…Automatic music transcription converts audio files to symbolic representations such as sheet music [3]; audio fingerprinting recognises specific audio recordings, for example in Shazam [32] or in query-by-humming [27]. Turning to live performance, automatic score following (audio-to-score) automatically synchronises live audio or MIDI input to a pre-composed score [15] so as to control computer-generated accompaniment, digital effects or trigger extra-musical events such as lighting and visuals [14]. More flexibly, Cypher [26] analyses MIDI data in real-time, extracting key, chord, beat and phrase group features so as to generate musical accompaniment while the Analyser plugin [29] for Digital Audio Workstations extracts real-time audio features and maps them to Open Sound Control (OSC) to control live visuals.…”
Section: Music Recognition Technologiesmentioning
confidence: 99%
“…We exploit an HMM model, where the hidden states represent the concurrencies played at each time frame. For complexity reasons, we choose a prior model similar to the Markovian model of [8], which only constrains the concurrency sequence to follow the same structure as in the score. We assume that the concurrencies are numbered in the order in which they appear in the score.…”
Section: Adaptation Of the Mapping Matrixmentioning
confidence: 99%
“…Other types of information have also been exploited, such as note onsets [7] or tempo information [8,9].…”
Section: Introductionmentioning
confidence: 99%
“…Inference in this model is performed using causal inference, since the goal is real-time score following in live performances. A perspective on audio-to-score alignment using Conditional Random Fields is taken by the authors of [6]. They propose models of various complexities, with the best-performing model resembling a HHMM with the duration of note events influenced by an additional tempo variable.…”
Section: Introductionmentioning
confidence: 99%
“…Since we aim to link the scores and the performance in the section-level and not directly aim at a note-to-note alignment, we avoid modeling strategies as presented in [5,6] and come up with a computationally lighter but precise model for section linking.…”
Section: Introductionmentioning
confidence: 99%