2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013
DOI: 10.1109/icassp.2013.6639357
|View full text |Cite
|
Sign up to set email alerts
|

Subband autocorrelation features for video soundtrack classification

Abstract: Inspired by prior work on stabilized auditory image features, we have developed novel auditory-model-based features that preserve the fine time structure lost in conventional frame-based features. While the original auditory model is computationally intense, we present a simpler system that runs about ten times faster but achieves equivalent performance. We use these features for video soundtrack classification with the Columbia Consumer Video dataset, showing that the new features alone are roughly comparable… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2014
2014
2018
2018

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 11 publications
0
5
0
Order By: Relevance
“…The relationships are then used in turn to adjust the network weights for improved classification performance. Using the trace norm allows us to derive the analytical solution in Equation 10 and Equation 11, which satisfies our goal of learning the relationships Ψ and Ω based on W. More specifically, the training procedure of the proposed method is summarized in Algorithm 1. For each epoch, additional efforts are required to compute the gradient matrix G m l for updating W m l , as well as to update the matrices Ω and Ψ.…”
Section: : End Formentioning
confidence: 99%
See 1 more Smart Citation
“…The relationships are then used in turn to adjust the network weights for improved classification performance. Using the trace norm allows us to derive the analytical solution in Equation 10 and Equation 11, which satisfies our goal of learning the relationships Ψ and Ω based on W. More specifically, the training procedure of the proposed method is summarized in Algorithm 1. For each epoch, additional efforts are required to compute the gradient matrix G m l for updating W m l , as well as to update the matrices Ω and Ψ.…”
Section: : End Formentioning
confidence: 99%
“…Most of the existing work focused on developing effective features [10,24,46], novel recognition methods [29,39], or comprehensive systems that integrate multiple features and classifiers for competitive classification performance [23,30]. Besides accuracy, efficiency is another important factor that should be considered in the design of a modern video classification system.…”
Section: Related Workmentioning
confidence: 99%
“…Most approaches followed a very standard pipeline, where various features are first extracted and then used as inputs of classifiers. Many works have focused on the design of novel features, such as the biologically inspired pipeline [12], Spatial-Temporal Interest Points (STIP) [13], trajectory-based descriptors [2], audio clues [14], and the Convolutional Neural Networks based features [1], [5], [6], [15].…”
Section: Related Workmentioning
confidence: 99%
“…lMEL features also contribute. We also integrated two other types of features, sub-band auto-correlation features (SBPCA) [26], and self-organized units (SOUs) [19], however these had the least influence on overall performance, despite them being potentially complementary as well. Table 4 shows the official evaluation results.…”
Section: System Combination and Fusionmentioning
confidence: 99%
“…The presented audio and ASR systems were tuned for EK 100, and performed best for most conditions, with almost no tuning effort for the other setups. No low-level features could be used in the EK 0 condition, so only the semantic "noiseme" fea- [26] 11.55 -0.01 SOUs [19] 15.29 0.11 tures, as described in Section 3.1, were used in the "Audio" case. Words of the ASR output and names of the semantic concepts that could be detected were mapped to the terms contained in the event kits in this case.…”
Section: System Combination and Fusionmentioning
confidence: 99%