VSD2014: A dataset for violent scenes detection in hollywood movies and web videos

Schedi, Markus; Sjöberg, Mats; Mironica, Ionut; Ionescu, Bogdan; Quang, Vu Lam; Jiang, Yu‐Gang; Demarty, Claire-Hélène

doi:10.1109/cbmi.2015.7153604

Cited by 23 publications

(23 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…At the beginning, the algorithm selects groups G 3 , G 4 and G 5 in practically 100% of the cases because of the low threshold imposed (1 MaxMFLOPS). When we increase this value to 3 MaxMFLOPS the spectral features appear.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Energy-Efficient Acoustic Violence Detector for Smart Cities

Bautista-Durán¹,

García-Gómez²,

Gil-Pita³

et al. 2017

IJCIS

View full text Add to dashboard Cite

Violence detection represents an important issue to take into account in the design of intelligent algorithms for smart environments. This paper proposes an energy-efficient system capable of acoustically detecting violence. In our solution, genetic algorithms are used to select the best subset of features with a constrained computational cost. Results demonstrate the viability of the system, thanks to the low cost that some violence features require, making feasible the implementation of the proposed method in a nowadays low power microprocessor.

show abstract

Section: Resultsmentioning

confidence: 99%

“…In this sense, violence can be detected through audio and video surveillance. Some works in the literature treat this problem using both audio and video processing, 3,4,5 and the results obtained with the combination of those sources seems to be efficient.…”

Section: Introductionmentioning

confidence: 99%

Energy-Efficient Acoustic Violence Detector for Smart Cities

Bautista-Durán¹,

García-Gómez²,

Gil-Pita³

et al. 2017

IJCIS

View full text Add to dashboard Cite

show abstract

“…This approach outperformed the other methods on the second VSD sub-task (i.e., violence detection in user generated videos from YouTube). The most common features used by most of the participating teams were MFCC (audio) and dense trajectories (visual+temporal) [17].…”

Section: Introductionmentioning

confidence: 99%

“…The Affect Task of MediaEval has provided a common ground for researchers to work on this problem and compare their algorithms in an efficient way. A publicly available dataset provides a detailed annotation ground truth of multiple audio and visual concepts concerning violence [17]. In MediaEval 2014, many teams participated for the VSD task [17].…”

Section: Introductionmentioning

confidence: 99%

“…A publicly available dataset provides a detailed annotation ground truth of multiple audio and visual concepts concerning violence [17]. In MediaEval 2014, many teams participated for the VSD task [17]. In [18], the authors used Deep Neural Networks (DNN) along with Support Vector Machines (SVM) and extracted different audio-visual features (i.e., MFCC, dense trajectories [19], spatio-temporal interest points (STIP)).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Violent Scene Detection Using a Super Descriptor Tensor Decomposition

Khokher

Bouzerdoum

Phung

2015

2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

View full text Add to dashboard Cite

This article presents a new method for violent scene detection using super descriptor tensor decomposition. Multi-modal local features comprising auditory and visual features are extracted from Mel-frequency cepstral coefficients (including first and second order derivatives) and refined dense trajectories. There is usually a large number of dense trajectories extracted from a video sequence; some of these trajectories are unnecessary and can affect the accuracy. We propose to refine the dense trajectories by selecting only discriminative trajectories in the region of interest. Visual descriptors consisting of oriented gradient and motion boundary histograms are computed along the refined dense trajectories. In traditional bag-of-visual-words techniques, the feature descriptors are concatenated to form a single large feature vector for classification. This destroys the spatio-Temporal interactions among features extracted from multi-modal data. To address this problem, a super descriptor tensor decomposition is proposed. The extracted feature descriptors are first encoded using super descriptor vector method. Then the encoded features are arranged as tensors so as to retain the spatioTemporal structure of the features. To obtain a compact set of features for classification, the TUCKER-3 decomposition is applied to the super descriptor tensors, followed by feature selection using Fisher feature ranking. The obtained features are fed to a support vector machine classifier. Experimental evaluation is performed on violence detection benchmark dataset, MediaEval VSD2014. The proposed method outperforms most of the state-of-The-Art methods, achieving MAP2014 scores of 60.2% and 67.8% on two subsets of the dataset. Abstract-This article presents a new method for violent scene detection using super descriptor tensor decomposition. Multi-modal local features comprising auditory and visual features are extracted from Mel-frequency cepstral coefficients (including first and second order derivatives) and refined dense trajectories. There is usually a large number of dense trajectories extracted from a video sequence; some of these trajectories are unnecessary and can affect the accuracy. We propose to refine the dense trajectories by selecting only discriminative trajectories in the region of interest. Visual descriptors consisting of oriented gradient and motion boundary histograms are computed along the refined dense trajectories. In traditional bag-of-visual-words techniques, the feature descriptors are concatenated to form a single large feature vector for classification. This destroys the spatio-temporal interactions among features extracted from multi-modal data. To address this problem, a super descriptor tensor decomposition is proposed. The extracted feature descriptors are first encoded using super descriptor vector method. Then the encoded features are arranged as tensors so as to retain the spatio-temporal structure of the features. To obtain a compact set of features for classification, the TUCKER-3 de...

show abstract