Learning a robust Tonnetz-space transform for automatic chord recognition

Humphrey, Eric J.; Cho, Taemin; Bello, Juan Pablo

doi:10.1109/icassp.2012.6287914

Cited by 43 publications

(20 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Based on the characteristics of harmonic structures in the frequency domain, many effective techniques to extract chroma features have been proposed (e.g., [6], [7]). Furthermore, data-driven approaches have recently been considered to be promising [4], [8], [9].…”

Section: Related Workmentioning

confidence: 99%

Automatic Chord Estimation Based on a Frame-wise Convolutional Recurrent Neural Network with Non-Aligned Annotations

Carsault

Yoshii

2019

2019 27th European Signal Processing Conference (EUSIPCO)

View full text Add to dashboard Cite

This paper describes a weakly-supervised approach to Automatic Chord Estimation (ACE) task that aims to estimate a sequence of chords from a given music audio signal at the frame level, under a realistic condition that only non-aligned chord annotations are available. In conventional studies assuming the availability of time-aligned chord annotations, Deep Neural Networks (DNNs) that learn frame-wise mappings from acoustic features to chords have attained excellent performance. The major drawback of such frame-wise models is that they cannot be trained without the time alignment information. Inspired by a common approach in automatic speech recognition based on nonaligned speech transcriptions, we propose a two-step method that trains a Hidden Markov Model (HMM) for the forced alignment between chord annotations and music signals, and then trains a powerful frame-wise DNN model for ACE. Experimental results show that although the frame-level accuracy of the forced alignment was just under 90%, the performance of the proposed method was degraded only slightly from that of the DNN model trained by using the ground-truth alignment data. Furthermore, using a sufficient amount of easily collected non-aligned data, the proposed method is able to reach or even outperform the conventional methods based on ground-truth time-aligned annotations.

show abstract

Section: Related Workmentioning

confidence: 99%

Automatic Chord Estimation Based on a Frame-wise Convolutional Recurrent Neural Network with Non-Aligned Annotations

Carsault

Yoshii

2019

2019 27th European Signal Processing Conference (EUSIPCO)

View full text Add to dashboard Cite

show abstract

“…In particular, much time and energy has been invested in developing not just better features, but specifically better chroma features [8]. Acknowledging the challenges inherent to designing good features, Pachet et al pioneered work in automatic feature optimization [11], and more recently deep learning methods have been employed to produce robust Tonnetz features [4]. Alternatively, some work leverages the repetitive structure of music to smooth a chroma features prior to classification [1].…”

Section: Introductionmentioning

confidence: 99%

Rethinking Automatic Chord Recognition with Convolutional Neural Networks

Humphrey

Bello

2012

2012 11th International Conference on Machine Learning and Applications

Self Cite

View full text Add to dashboard Cite

Despite early success in automatic chord recognition, recent efforts are yielding diminishing returns while basically iterating over the same fundamental approach. Here, we abandon typical conventions and adopt a different perspective of the problem, where several seconds of pitch spectra are classified directly by a convolutional neural network. Using labeled data to train the system in a supervised manner, we achieve state of the art performance through this initial effort in an otherwise unexplored area. Subsequent error analysis provides insight into potential areas of improvement, and this approach to chord recognition shows promise for future harmonic analysis systems.

show abstract

“…Recently, parallel to other areas, deep learning has been successfully introduced to the MIR [14, 15]. A deep learning algorithm constructs multiple levels of data abstraction (a hierarchy of features) in order to model high-level representations present in the observed data [16].…”

Section: Introductionmentioning

confidence: 99%

Robust Real-Time Music Transcription with a Compositional Hierarchical Model

2017

View full text Add to dashboard Cite

The paper presents a new compositional hierarchical model for robust music transcription. Its main features are unsupervised learning of a hierarchical representation of input data, transparency, which enables insights into the learned representation, as well as robustness and speed which make it suitable for real-world and real-time use. The model consists of multiple layers, each composed of a number of parts. The hierarchical nature of the model corresponds well to hierarchical structures in music. The parts in lower layers correspond to low-level concepts (e.g. tone partials), while the parts in higher layers combine lower-level representations into more complex concepts (tones, chords). The layers are learned in an unsupervised manner from music signals. Parts in each layer are compositions of parts from previous layers based on statistical co-occurrences as the driving force of the learning process. In the paper, we present the model’s structure and compare it to other hierarchical approaches in the field of music information retrieval. We evaluate the model’s performance for the multiple fundamental frequency estimation. Finally, we elaborate on extensions of the model towards other music information retrieval tasks.

show abstract

Learning a robust Tonnetz-space transform for automatic chord recognition

Cited by 43 publications

References 4 publications

Automatic Chord Estimation Based on a Frame-wise Convolutional Recurrent Neural Network with Non-Aligned Annotations

Automatic Chord Estimation Based on a Frame-wise Convolutional Recurrent Neural Network with Non-Aligned Annotations

Rethinking Automatic Chord Recognition with Convolutional Neural Networks

Robust Real-Time Music Transcription with a Compositional Hierarchical Model

Contact Info

Product

Resources

About