An evolutionary feature synthesis approach for content-based audio retrieval

Mäkinen, Timo; Kıranyaz, Serkan; Raitoharju, Jenni; Gabbouj, Moncef

doi:10.1186/1687-4722-2012-23

Cited by 10 publications

(7 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Their process includes finding the decomposition of a signal from a dictionary of atoms, which would yield the best set of functions to form an approximate representation. More recently, Mäkinen et al [33] propose an evolutionary feature synthesis technique to enhance common audio descriptors by using multidimensional particle swarm optimization to search for the optimal feature synthesis parameters. These works forward the auditory scene research; however, not all the unpredictable structures of auditory environments can be directly discriminated by a low-level feature subset.…”

Section: Related Workmentioning

confidence: 99%

Context-based environmental audio event recognition for scene understanding

Lü

Wang

2014

Multimedia Systems

View full text Add to dashboard Cite

To the best of our knowledge, this is the first work that models event correlations as scene context for robust audio event detection from complex and noisy environments. Note that according to the recent report, the mean accuracy for the acoustic scene classification task by human listeners is only around 71 % on the data collected in office environments from the DCASE dataset. None of the existing methods performs well on all scene categories and the average accuracy of the best performances of the recent 11 methods is 53.8 %. The proposed method averagely achieves an accuracy of 62.3 % on the same dataset. Additionally, we create a 10-CASE dataset by manually collecting 5,250 audio clips of 10 scene types and 21 event categories. Our experimental results on 10-CASE show that the proposed method averagely achieves the enhanced performance of 78.3 %, and the average accuracy of audio event recognition can be effectively improved by capturing dominant audio sources and reasoning non-dominant events from the dominant ones through acoustic context modeling. In the future work, exploring the interactions between acoustic scene recognition and audio event detection, and incorporating other modalities to improve the accuracy are required to further advance the proposed framework.

show abstract

Section: Related Workmentioning

confidence: 99%

Context-based environmental audio event recognition for scene understanding

Lü

Wang

2014

Multimedia Systems

View full text Add to dashboard Cite

show abstract

“…A multi-objective EA with the target to minimise the number of selected features and the classification error was presented in [29]. EAs have proven their ability to generate new features for music classification by exploring nearly unlimited search spaces of combinations of different transforms and mathematical operations [22,19,15]. EAs have also been successfully applied for training set augmentation in bioinformatics, where the generation of new training data may be very expensive [11,32], and for training set selection (TSS) [6].…”

Section: Multi-objective Evolutionary Optimisation and Training Set Selectionmentioning

confidence: 99%

Evolutionary Multi-objective Training Set Selection of Data Instances and Augmentations for Vocal Detection

Vatolkin

Stoller

2019

Computational Intelligence in Music, Sound, Art and Design

View full text Add to dashboard Cite

The size of publicly available music data sets has grown significantly in recent years, which allows training better classification models. However, training on large data sets is time-intensive and cumbersome, and some training instances might be unrepresentative and thus hurt classification performance regardless of the used model. On the other hand, it is often beneficial to extend the original training data with augmentations, but only if they are carefully chosen. Therefore, identifying a "smart" selection of training instances should improve performance. In this paper, we introduce a novel, multi-objective framework for training set selection with the target to simultaneously minimise the number of training instances and the classification error. Experimentally, we apply our method to vocal activity detection on a multi-track database extended with various audio augmentations for accompaniment and vocals. Results show that our approach is very effective at reducing classification error on a separate validation set, and that the resulting training set selections either reduce classification error or require only a small fraction of training instances for comparable performance.

show abstract

“…The proposed EFS system has been previously successfully applied on audio features [1]. An initial version of the proposed EFS has been also applied on image features in [2], but the fitness functions were not generic nor applicable in real life as they required classification or retrieval over the whole EFS dataset for every fitness evaluation.…”

Section: Introductionmentioning

confidence: 99%

Evolutionary feature synthesis by multi-dimensional particle swarm optimization

Raitoharju

Kıranyaz

Gabbouj

2014

2014 5th European Workshop on Visual Information Processing (EUVIP)

Self Cite

View full text Add to dashboard Cite

Several existing content-based image retrieval and classification systems rely on low-level features which are automatically extracted from images. However, often these features lack the discrimination power needed for accurate description of the image content and hence they may lead to a poor retrieval or classification performance. This article applies an evolutionary feature synthesis method based on multidimensional particle swarm optimization on low-level image features to enhance their discrimination ability. The proposed method can be applied on any database and low-level features as long as some ground-truth information is available. Content-based image retrieval experiments show that a significant performance improvement can be achieved.

show abstract

An evolutionary feature synthesis approach for content-based audio retrieval

Cited by 10 publications

References 32 publications

Context-based environmental audio event recognition for scene understanding

Context-based environmental audio event recognition for scene understanding

Evolutionary Multi-objective Training Set Selection of Data Instances and Augmentations for Vocal Detection

Evolutionary feature synthesis by multi-dimensional particle swarm optimization

Contact Info

Product

Resources

About