Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-74
|View full text |Cite
|
Sign up to set email alerts
|

Multilingual i-Vector Based Statistical Modeling for Music Genre Classification

Abstract: For music signal processing, compared with the strategy which models each short-time frame independently, when the long-time features are considered, the time-series characteristics of the music signal can be better presented. As a typical kind of long-time modeling strategy, the identification vector (i-vector) uses statistical modeling to model the audio signal in the segment level. It can better capture the important elements of the music signal, and these important elements may benefit to the classificatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 12 publications
0
5
0
Order By: Relevance
“…These two models are used to extract a feature vector from variable length audio signal. In addition to speaker identification tasks, these models are used in other areas such as language identification [25], emotion recognition [26], music genre classification [27], and online signature verification [28].…”
Section: Proposed Frameworkmentioning
confidence: 99%
“…These two models are used to extract a feature vector from variable length audio signal. In addition to speaker identification tasks, these models are used in other areas such as language identification [25], emotion recognition [26], music genre classification [27], and online signature verification [28].…”
Section: Proposed Frameworkmentioning
confidence: 99%
“…As it can be seen the combination of the bass, vocals and accompaniment without velocities gives us the best performance. The best of these scores are either equal to better than the peak performance of [16].…”
Section: Chapter 5 Resultsmentioning
confidence: 99%
“…The overall architecture of the system follows the process as described by it. Some of the ideas of extracting spectral features from the audio track itself, using a deep neural network and supplying it to a classifier, as explained in [16] have been adopted within this thesis extensively. Using inspiration from the aforementioned publication, thirty second clips from the start of every track are used as input into the proposed method.…”
Section: Proposed Methodsmentioning
confidence: 99%
See 2 more Smart Citations