2017
DOI: 10.1109/tmm.2017.2703939
|View full text |Cite
|
Sign up to set email alerts
|

DCAR: A Discriminative and Compact Audio Representation for Audio Processing

Abstract: This paper presents a novel two-phase method for audio representation, Discriminative and Compact Audio Representation (DCAR), and evaluates its performance at detecting events in consumer-produced videos. In the first phase of DCAR, each audio track is modeled using a Gaussian mixture model (GMM) that includes several components to capture the variability within that track. The second phase takes into account both global structure and local structure. In this phase, the components are rendered more discrimina… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(4 citation statements)
references
References 34 publications
0
4
0
Order By: Relevance
“…In a different scenario, [57], a highly efficient procedure was proposed for music analysis based on matrix similarity representations to predict the popularity of a certain song according to its similarity with others. In [30], a novel technique for audio representation is presented. In particular, authors proposed DCAR, a two-phased method that outperforms the state-of-art audio representations and has considerable benefits in event detection and audio-scene classification.…”
Section: Background and Related Researchmentioning
confidence: 99%
“…In a different scenario, [57], a highly efficient procedure was proposed for music analysis based on matrix similarity representations to predict the popularity of a certain song according to its similarity with others. In [30], a novel technique for audio representation is presented. In particular, authors proposed DCAR, a two-phased method that outperforms the state-of-art audio representations and has considerable benefits in event detection and audio-scene classification.…”
Section: Background and Related Researchmentioning
confidence: 99%
“…A suitable representation can effectively improve the generalization ability of the model, MFCC and logMel have been proven to be useful in CNNs. [48] presents a novel two-phase method for audio representation, they take into account both global structure and local structure, the learned representation can effectively represent the structure of audio. [49] argue that an image-like spectrogram cannot well capture the complex texture details of the spectrogram, so that they proposed a multichannel LBP feature to improve the robustness to the audio noise.…”
Section: A Audio Classificationmentioning
confidence: 99%
“…With the arrival of smart video surveillance systems, innovative ways for quickly and effectively detecting malicious occurrences or behaviors in monitored settings based on real-time analysis of multimedia streams have emerged [12], [13]. Most real-world audio recordings are complicated in that they are composed of sequences of many different sounds [9], [14], [15].…”
Section: Introductionmentioning
confidence: 99%