2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2011
DOI: 10.1109/icassp.2011.5947698
|View full text |Cite
|
Sign up to set email alerts
|

Sparse coding of auditory features for machine hearing in interference

Abstract: A key problem in using the output of an auditory model as the input to a machine-learning system in a machine-hearing application is to find a good feature-extraction layer. For systems such as PAMIR (passive-aggressive model for image retrieval) that work well with a large sparse feature vector, a conversion from auditory images to sparse features is needed. For audio-file ranking and retrieval from text queries, based on stabilized auditory images, we took a multi-scale approach, using vector quantization to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
14
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(14 citation statements)
references
References 8 publications
0
14
0
Order By: Relevance
“…By default, the nonlinear frequency resolution is 78 and the time lag resolution is 561. In [4], multiple rectangular regions were extracted from each SAI according to a 'start-small and then double' heuristic outlined in [6], starting with an initial window of size D = 16 by B = 32. In total R = 49 rectangular regions were extracted and then downsampled to match the D × B size of the initial window, and the marginals of each region computed as discussed in Section II-A to yield a representative feature vector of size D + B, i.e.…”
Section: A Sai Featuresmentioning
confidence: 99%
See 4 more Smart Citations
“…By default, the nonlinear frequency resolution is 78 and the time lag resolution is 561. In [4], multiple rectangular regions were extracted from each SAI according to a 'start-small and then double' heuristic outlined in [6], starting with an initial window of size D = 16 by B = 32. In total R = 49 rectangular regions were extracted and then downsampled to match the D × B size of the initial window, and the marginals of each region computed as discussed in Section II-A to yield a representative feature vector of size D + B, i.e.…”
Section: A Sai Featuresmentioning
confidence: 99%
“…To further improve efficiency, the large-scale systems [4] perform vector quantisation (VQ) [16] or matching pursuit [34] on each rectangle, and represent the output as a sparse code. In this paper, both VQ and non-VQ results will be presented.…”
Section: A Sai With Pamirmentioning
confidence: 99%
See 3 more Smart Citations