Estimation of Important Scenes in Soccer Videos Based on Collaborative Use of Audio-Visual CNN Features

Haruyama, Tomoki; Takahashi, Sho; Ogawa, Takahiro; Haseyama, Miki

doi:10.1109/gcce.2018.8574727

Cited by 6 publications

(4 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is not common to extract audio features using VGG16 trained by using ImageNet. However, since the effectiveness of the audio feature extraction using a CNN model trained by using ImageNet has been reported in the method for detection of important scenes of other sports videos [27], the proposed method experimentally adopts VGG16 trained by using ImageNet. Thus, it is expected to be effective for our tasks.…”

Section: ) Audio Featuresmentioning

confidence: 99%

See 1 more Smart Citation

Detection of Important Scenes in Baseball Videos via Bidirectional Time Lag Aware Deep Multiset Canonical Correlation Analysis

et al. 2021

Self Cite

View full text Add to dashboard Cite

A novel method for detection of important scenes in baseball videos based on correlation maximization between heterogeneous modalities via bidirectional time lag aware deep multiset canonical correlation analysis (BiTl-dMCCA) is presented in this paper. The proposed method enables detection of important scenes by collaboratively using baseball videos and their corresponding tweets. The technical contributions of this paper are twofold. First, since there are time lags between not only "tweets and corresponding multiple previous events" but also "events and corresponding multiple following posted tweets", the proposed method considers these bidirectional time lags. Specifically, the representation of such bidirectional time lags into the derivation of their covariance matrices is newly introduced. Second, the proposed method adopts textual, visual and audio features calculated from tweets and videos as multimodal time series features. Important scenes are detected as abnormal scenes via anomaly detection based on a generative adversarial network using multi-modal features projected by BiTl-dMCCA. The proposed method does not need any training data with annotation. Experimental results obtained by applying the proposed method to actual baseball matches show the effectiveness of the proposed method.INDEX TERMS Unsupervised important scene detection, time lag aware canonical correlation maximization, anomaly detection, generative adversarial network.

show abstract

Section: ) Audio Featuresmentioning

confidence: 99%

“…Comp. 12: This is a method based on [27] using a support vector machine (SVM) [39] for visual and audio features. In order to provide a fair comparison, a one-class SVM [40], which is an unsupervised method, was used instead of a general SVM for Comp.…”

Section: A Experimental Settingmentioning

confidence: 99%

Detection of Important Scenes in Baseball Videos via Bidirectional Time Lag Aware Deep Multiset Canonical Correlation Analysis

et al. 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…Free kick scenes our previously reported method [35], and the main improvement in the proposed method is the introduction of whistle features. Furthermore, we introduce CM6, a method based on a CNN model that is pre-trained with VGG16 [41].…”

Section: Foul Scenesmentioning

confidence: 99%

“…By realizing this approach, various important scenes can be accurately extracted from far-view soccer videos. It should be noted that this paper is an extended version of [35].…”

Section: Introductionmentioning

confidence: 99%

[Papers] Multimodal Important Scene Detection in Far-view Soccer Videos Based on Single Deep Neural Architecture

Haruyama¹,

Takahashi

Ogawa

et al. 2020

MTA

Self Cite

View full text Add to dashboard Cite

The details of the matches of soccer can be estimated from visual and audio sequences, and they correspond to the occurrence of important scenes. Therefore, the use of these sequences is suitable for important scene detection. In this paper, a new multimodal method for important scene detection from visual and audio sequences in far-view soccer videos based on a single deep neural architecture is presented. A unique point of our method is that multiple classifiers can be realized by a single deep neural architecture that includes a Convolutional Neural Network-based feature extractor and a Support Vector Machine-based classifier. This approach provides a solution to the problem of not being able to simultaneously optimize different multiple deep neural architectures from a small amount of training data. Then we monitor confidence measures output from this architecture for the multimodal data and enable their integration to obtain the final classification result.

show abstract

Sports video summarization using acoustic symmetric ternary codes and SVM

Banjar,

Dawood,

Javed

et al. 2024

Applied Acoustics

View full text Add to dashboard Cite

Estimation of Important Scenes in Soccer Videos Based on Collaborative Use of Audio-Visual CNN Features

Cited by 6 publications

References 9 publications

Detection of Important Scenes in Baseball Videos via Bidirectional Time Lag Aware Deep Multiset Canonical Correlation Analysis

Detection of Important Scenes in Baseball Videos via Bidirectional Time Lag Aware Deep Multiset Canonical Correlation Analysis

[Papers] Multimodal Important Scene Detection in Far-view Soccer Videos Based on Single Deep Neural Architecture

Sports video summarization using acoustic symmetric ternary codes and SVM

Contact Info

Product

Resources

About