Audio retrieval by latent perceptual indexing

Sundaram, Shiva; Narayanan, Shrikanth

doi:10.1109/icassp.2008.4517543

Cited by 26 publications

(29 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For this reason, Section 2 addresses the uncovering of an ontology from the tags [14] in an unsupervised form, to investigate whether such an ontology is not an imposed construction. Because a latent structure has been assumed, we use a technique called vector-based semantic analysis, which is a generalization of Latent Semantic Analysis [15] and similar to the methods used in latent semantic mapping [16] and latent perceptual indexing [17]. Thus, although some of the terminology is borrowed from these areas, our method is also different in several crucial respects.…”

Section: Social Taggingmentioning

confidence: 99%

Semantic structures of timbre emerging from social and acoustic descriptions of music

Ferrer

Eerola

2011

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

The perceptual attributes of timbre have inspired a considerable amount of multidisciplinary research, but because of the complexity of the phenomena, the approach has traditionally been confined to laboratory conditions, much to the detriment of its ecological validity. In this study, we present a purely bottom-up approach for mapping the concepts that emerge from sound qualities. A social media (http://www.last.fm) is used to obtain a wide sample of verbal descriptions of music (in the form of tags) that go beyond the commonly studied concept of genre, and from this the underlying semantic structure of this sample is extracted. The structure that is thereby obtained is then evaluated through a careful investigation of the acoustic features that characterize it. The results outline the degree to which such structures in music (connected to affects, instrumentation and performance characteristics) have particular timbral characteristics. Samples representing these semantic structures were then submitted to a similarity rating experiment to validate the findings. The outcome of this experiment strengthened the discovered links between the semantic structures and their perceived timbral qualities. The findings of both the computational and behavioural parts of the experiment imply that it is therefore possible to derive useful and meaningful structures from free verbal descriptions of music, that transcend musical genres, and that such descriptions can be linked to a set of acoustic features. This approach not only provides insights into the definition of timbre from an ecological perspective, but could also be implemented to develop applications in music information research that organize music collections according to both semantic and sound qualities.

show abstract

Section: Social Taggingmentioning

confidence: 99%

Semantic structures of timbre emerging from social and acoustic descriptions of music

Ferrer

Eerola

2011

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

show abstract

“…Sparse codes are also known to be analogous to the coding mechanism in neural sensory system (see [97] and references therein). Interestingly, mathematical analogy between sparse representation of data and dimension reduction using matrix factorization (used in the bag-of-units representation by [93]) has also been observed in the context of speech recognition (the reader is referred to [102] for an overview). Starting from perceptual features, retrieval techniques using these representations can therefore be used to further render higher level sensory processes in the auditory system.…”

Section: A Semantic Audio Retrievalmentioning

confidence: 90%

“…By modeling audio as a collection of units, the approach is able to scale to (arbitrary) collection of audio clips. Notable examples include latent perceptual indexing by Sundaram et al [93], [94], the related anchor-space model by Lu et al [95] and Lee et al [96] and the bag-of-patterns representation used by Lyon et al [97]. These techniques formalize the method discussed in Slaney et al [98] where a form of unit-document co-occurrence is implicitly used for semantic information extraction from audio.…”

Section: A Semantic Audio Retrievalmentioning

confidence: 99%

“…In Sundaram et al [93], the authors use the centroids of clusters of signal features as acoustic units and subsequently derive unit-document frequencies between the centroids and features extracted from audio clips. While the initial work was performed on semantic categories, this approach has also been investigated for onomatopoeic categorization of audio [94].…”

Section: A Semantic Audio Retrievalmentioning

confidence: 99%

See 1 more Smart Citation

An Overview on Perceptually Motivated Audio Indexing and Classification

Richard

Sundaram²,

Narayanan

2013

Proc. IEEE

Self Cite

View full text Add to dashboard Cite

An audio indexing system aims at describing audio content by identifying, labeling or categorizing different acoustic events. Since the resulting audio classification and indexing is meant for direct human consumption, it is highly desirable that it produces perceptually relevant results. This can be obtained by integrating specific knowledge of the human auditory system in the design process to various extent. In this paper, we highlight some of the important concepts used in audio classification and indexing that are perceptually motivated or that exploit some principles of perception. In particular, we discuss several different strategies to integrate human perception including 1) the use of generic audition models, 2) the use of perceptually-relevant features for the analysis stage that are perceptually justified either as a component of a hearing model or as being correlated with a perceptual dimension of sound similarity, and 3) the involvement of the user in the audio indexing or classification task. In the paper, we also illustrate some of the recent trends in semantic audio retrieval that approximate higher level perceptual processing and cognitive aspects of human audio recognition capabilities including affect-based audio retrieval.

show abstract

“…Although audio analysis has been widely studied in scene classification [8,9,10], audio segmentation [11,12,13], and audio retrieval [14,15,16], to our knowledge, automatic audio tagging has not been much explored. Bertin-Mahieux et al [17] treated audio tag prediction as a set of binary classification problems and applied the Adaboost algorithm to the task.…”

Section: Introductionmentioning

confidence: 99%

Fast tagging of natural sounds using marginal co-regularization

Huang

Jackson

et al. 2017

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Automatic and fast tagging of natural sounds in audio collections is a very challenging task due to wide acoustic variations, the large number of possible tags, the incomplete and ambiguous tags provided by different labellers. To handle these problems, we use a co-regularization approach to learn a pair of classifiers on sound and text. The first classifier maps low-level audio features to a true tag list. The second classifier maps actively corrupted tags to the true tags, reducing incorrect mappings caused by low-level acoustic variations in the first classifier, and to augment the tags with additional relevant tags. Training the classifiers is implemented using marginal co-regularization, pair of which draws the two classifiers into agreement by a joint optimization. We evaluate this approach on two sound datasets, Freefield1010 and Task4 of DCASE2016. The results obtained show that marginal co-regularization outperforms the baseline GMM in both efficiency and effectiveness.

show abstract

Audio retrieval by latent perceptual indexing

Cited by 26 publications

References 4 publications

Semantic structures of timbre emerging from social and acoustic descriptions of music

Semantic structures of timbre emerging from social and acoustic descriptions of music

An Overview on Perceptually Motivated Audio Indexing and Classification

Fast tagging of natural sounds using marginal co-regularization

Contact Info

Product

Resources

About