Sparse coding of auditory features for machine hearing in interference

Lyon, Richard F.; Ponte, Jay; Chechik, Gal

doi:10.1109/icassp.2011.5947698

Cited by 6 publications

(14 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…By default, the nonlinear frequency resolution is 78 and the time lag resolution is 561. In [4], multiple rectangular regions were extracted from each SAI according to a 'start-small and then double' heuristic outlined in [6], starting with an initial window of size D = 16 by B = 32. In total R = 49 rectangular regions were extracted and then downsampled to match the D × B size of the initial window, and the marginals of each region computed as discussed in Section II-A to yield a representative feature vector of size D + B, i.e.…”

Section: A Sai Featuresmentioning

confidence: 99%

“…To further improve efficiency, the large-scale systems [4] perform vector quantisation (VQ) [16] or matching pursuit [34] on each rectangle, and represent the output as a sparse code. In this paper, both VQ and non-VQ results will be presented.…”

Section: A Sai With Pamirmentioning

confidence: 99%

“…Where VQ is used, performance tends to increase with codebook size, saturating at a size of around 512. A size of 256 was used in [4].…”

Section: A Sai With Pamirmentioning

confidence: 99%

“…16 + 32 = 48 per SAI frame. For the Google-SAI PAMIR-based recall systems [4], the features from each rectangular region are vector quantized using a separate codebook of size 256 for each feature. A sparse representation is then constructed by concatenating the 49 VQ output vectors, forming a super-vector of length 12, 544.…”

Section: A Sai Featuresmentioning

confidence: 99%

“…The SAI features used in this work are derived as defined in [4] using AIM-C [24], extracted over 35 ms windows to yield a real-valued matrix for each analysis frame. By default, the nonlinear frequency resolution is 78 and the time lag resolution is 561.…”

Section: A Sai Featuresmentioning

confidence: 99%

See 4 more Smart Citations

Robust Sound Event Classification Using Deep Neural Networks

McLoughlin

Zhang

Xie

et al. 2015

IEEE/ACM Trans. Audio Speech Lang. Process.

238

177

View full text Add to dashboard Cite

Abstract-The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogrambased or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques.

show abstract

Section: A Sai Featuresmentioning

confidence: 99%

Section: A Sai With Pamirmentioning

confidence: 99%

“…Where VQ is used, performance tends to increase with codebook size, saturating at a size of around 512. A size of 256 was used in [4].…”

Section: A Sai With Pamirmentioning

confidence: 99%

Section: A Sai Featuresmentioning

confidence: 99%

Section: A Sai Featuresmentioning

confidence: 99%

See 3 more Smart Citations

Robust Sound Event Classification Using Deep Neural Networks

McLoughlin

Zhang

Xie

et al. 2015

IEEE/ACM Trans. Audio Speech Lang. Process.

238

177

View full text Add to dashboard Cite

show abstract

Learning representations of sound using trainable COPE feature extractors

Strisciuglio

Vento

Petkov

2019

Pattern Recognition

View full text Add to dashboard Cite

Sound analysis research has mainly been focused on speech and music processing. The deployed methodologies are not suitable for analysis of sounds with varying background noise, in many cases with very low signal-to-noise ratio (SNR).In this paper, we present a method for the detection of patterns of interest in audio signals. We propose novel trainable feature extractors, which we call COPE (Combination of Peaks of Energy). The structure of a COPE feature extractor is determined using a single prototype sound pattern in an automatic configuration process, which is a type of representation learning. We construct

show abstract

Subband autocorrelation features for video soundtrack classification

Cotton

Ellis

2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

Inspired by prior work on stabilized auditory image features, we have developed novel auditory-model-based features that preserve the fine time structure lost in conventional frame-based features. While the original auditory model is computationally intense, we present a simpler system that runs about ten times faster but achieves equivalent performance. We use these features for video soundtrack classification with the Columbia Consumer Video dataset, showing that the new features alone are roughly comparable to traditional MFCCs, but combining classifiers based on both features achieves a 15% improvement in mean Average Precision over the MFCC baseline.

show abstract

Sparse coding of auditory features for machine hearing in interference

Cited by 6 publications

References 8 publications

Robust Sound Event Classification Using Deep Neural Networks

Robust Sound Event Classification Using Deep Neural Networks

Learning representations of sound using trainable COPE feature extractors

Subband autocorrelation features for video soundtrack classification

Contact Info

Product

Resources

About