Towards a more efficient sparse coding based audio-word feature extraction system

Yeh, Chin‐Chia Michael; Yang, Yi-Hsuan

doi:10.1109/apsipa.2013.6694252

Cited by 3 publications

(3 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The values for these metrics all fall within [0,1], and larger values indicate better performance. For each tag, we rank the test clips in descending order of the decision values computed by SVM and calculate the above measures according to the ranking [16]. We select only one exemplar for each frame in (2) with , and use the voting-based method in (3), such that .…”

Section: Methodsmentioning

confidence: 99%

“…to solve for ) [23], [46], and temporal pooling methods (i.e. to get ) [16], [47] have been proposed and compared. The focus of this letter, however, is to investigate efficient ways of exploiting unlabeled exemplars themselves as the dictionary atoms, a topic that is seldom addressed before.…”

Section: A Clip-level Lasso Screeningmentioning

confidence: 99%

“…In the past decade, a great effort has been made to use supervised machine learning algorithms to map signal-level audio features extractable by machine (e.g. temporal or spectral features) to high-level semantic labels using manually pre-labeled training samples [8]- [16]. The task, however, remains challenging due to the following three issues: the scarcity of well-labeled training data [17], [18], the complexity involved in formalizing and evaluating the task while taking care of possible confounds [18], [19], and the difficulty of extracting good audio features that capture the characteristics of each tag [20]- [24].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Music Annotation and Retrieval using Unlabeled Exemplars: Correlation and Sparse Codes

Jao

Yang

2015

IEEE Signal Process. Lett.

Self Cite

View full text Add to dashboard Cite

Tagging music signals with semantic labels such as genres, moods and instruments is important for content-based music retrieval and recommendation. While considerable effort has been made, automatic music annotation is still considered challenging due to the difficulty of extracting good audio features that capture the characteristics of different tags. To address this issue, we present in this letter two exemplar-based approaches that represent the content of a music clip by referring to a large set of unlabeled audio exemplars. The first approach represents a music clip by the set of audio exemplars that is highly correlated with the short-time feature vectors of the clip, whereas the second approach represents a music clip as sparse linear combinations of its short-time feature vectors over the audio exemplars. Music annotation is then performed by learning the relevance of the audio examples to different tags using labeled data. These two approaches effectively capitalize the availability of unlabeled data to explore the commonality of music signals to find out tag-specific acoustic patterns, without domain knowledge and feature design. Evaluation on the CAL10k music genre tagging dataset for tag-based music retrieval shows that, with thousands of unlabeled audio examples randomly drawn from the Million Song Dataset, the proposed approaches lead to remarkably higher precision rates than existing approaches.

show abstract

Section: Methodsmentioning

confidence: 99%