2014
DOI: 10.1109/tmm.2014.2311016
|View full text |Cite
|
Sign up to set email alerts
|

A Systematic Evaluation of the Bag-of-Frames Representation for Music Information Retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
37
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 42 publications
(37 citation statements)
references
References 36 publications
0
37
0
Order By: Relevance
“…That is to standardize audio data format into Wav format and to split the original audio data into 30 sec long. In work done by Su et al (2014) it is stated that the reasons of using 30 seconds long audio data is to avoid wrong features extraction which may lead to inaccuracy of training and testing result.…”
Section: Methodsmentioning
confidence: 99%
“…That is to standardize audio data format into Wav format and to split the original audio data into 30 sec long. In work done by Su et al (2014) it is stated that the reasons of using 30 seconds long audio data is to avoid wrong features extraction which may lead to inaccuracy of training and testing result.…”
Section: Methodsmentioning
confidence: 99%
“…Refs. [18] and [19] show that spectrogram is the best local representation for sparse coding based audio-word type features comparing to Mel-spectrum, MFCC, Sonogram, and Constant-Q transform.…”
Section: Baseline Audio-word Extraction Pipelinementioning
confidence: 99%
“…Late Temporal Pooling: In our previous work [18], [19], [22], we have considered two types of late temporal pooling scheme: Bag-of-Frames (BoF) and Histogram based Bag-ofSegments (HBoS). BoF is the most common and simple way to pool audio-word-type features [14], [16], [22], where the audio-words for a given music clip are summed holistically over the whole clip.…”
Section: Baseline Audio-word Extraction Pipelinementioning
confidence: 99%
See 2 more Smart Citations