Multi-label Class-imbalanced Action Recognition in Hockey Videos via 3D Convolutional Neural Networks

Sozykin, Konstantin; Protasov, Stanislav; Khan, Adil; Hussain, Rasheed; Lee, Jooyoung

doi:10.1109/snpd.2018.8441034

Cited by 47 publications

(32 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Video summarized using the proposed system received a good score of 4 out of 5 from the participants. Sozykin K et al [17] proposed a 3D CNN-based multi-label deep Human Action Recognition (HAR) system for sports video summarization for the sport of Hockey and presented more than ten classes. Data pre-processing techniques like resizing, normalization, windowing, and sequence labeling were used.…”

Section: Scene Classification Via Deep-learning Approachmentioning

confidence: 99%

Scene Classification for Sports Video Summarization Using Transfer Learning

Rafiq

Agyeman

et al. 2020

Sensors

View full text Add to dashboard Cite

This paper proposes a novel method for sports video scene classification with the particular intention of video summarization. Creating and publishing a shorter version of the video is more interesting than a full version due to instant entertainment. Generating shorter summaries of the videos is a tedious task that requires significant labor hours and unnecessary machine occupation. Due to the growing demand for video summarization in marketing, advertising agencies, awareness videos, documentaries, and other interest groups, researchers are continuously proposing automation frameworks and novel schemes. Since the scene classification is a fundamental component of video summarization and video analysis, the quality of scene classification is particularly important. This article focuses on various practical implementation gaps over the existing techniques and presents a method to achieve high-quality of scene classification. We consider cricket as a case study and classify five scene categories, i.e., batting, bowling, boundary, crowd and close-up. We employ our model using pre-trained AlexNet Convolutional Neural Network (CNN) for scene classification. The proposed method employs new, fully connected layers in an encoder fashion. We employ data augmentation to achieve a high accuracy of 99.26% over a smaller dataset. We conduct a performance comparison against baseline approaches to prove the superiority of the method as well as state-of-the-art models. We evaluate our performance results on cricket videos and compare various deep-learning models, i.e., Inception V3, Visual Geometry Group (VGGNet16, VGGNet19) , Residual Network (ResNet50), and AlexNet. Our experiments demonstrate that our method with AlexNet CNN produces better results than existing proposals.

show abstract

Section: Scene Classification Via Deep-learning Approachmentioning

confidence: 99%

Scene Classification for Sports Video Summarization Using Transfer Learning

Rafiq

Agyeman

et al. 2020

Sensors

View full text Add to dashboard Cite

show abstract

“…They used pre-trained CNN to first extract the features then use LSTM for classification of the five types of events which are dump in, dump out, loose puck recovery, pass and shot. Sozykin et al [35] presented a 3D CNN based action recognition system for multi-class imbalanced in ice hockey. They first extract features from both single image and a slice of frames using CNN.…”

Section: Deep Learning Architecture In Sport Video Analysismentioning

confidence: 99%

Deep learning in sport video analysis: a review

et al. 2020

View full text Add to dashboard Cite

Sport is a competitive field, where it is an element of measurement for a countries development. Due to this reason, sport analysis has become one of the major contribution in analysing and improving the performance level of an athlete. Video-based modality has become a crucial tool used in sport analysis by coaches and performance analysis. There were wide variety of techniques used in sport video analysis. The main purpose of this review paper is to compare and update review between traditional handcrafted approach and deep learning approach in sport video analysis based on human activity recognition, overview of recent study in video based human activity recognition in sport analysis and finally concluded with future potential direction in sport video analysis.

show abstract

“…The use of multiple labels to represent multiple actions has grown in popularity due to the interest in detecting and recognizing simultaneous activity in videos. For example, concurrent action recognition in hockey videos [1] could indicate that a 'Play', 'Face Off' and 'Fight' took place at the same time. Similarly, the ability to tag multiple facial expressions in videos can be accomplished using multilabels to detect emotions, a crucial component of HCI [2].…”

Section: Introductionmentioning

confidence: 99%

“…Similarly, the ability to tag multiple facial expressions in videos can be accomplished using multilabels to detect emotions, a crucial component of HCI [2]. However, in all of the aforementioned references, an action or a combination of actions is assigned to a label, which in turn is given a binary assignment representing its absence (0) or its presence (1).…”

Section: Introductionmentioning

confidence: 99%

“…The contributions of the proposed work include the following: i) we incorporate a novel spatial aspect to our multilabel by assigning each label to a different region of interest. To the best of our knowledge, this is a unique way of handling the description of concurrent actions within a spatial context; ii) our proposed approach is able to detect different levels of activity, in contrast to existing approaches providing only a binary result as the presence or absence indicator [1], [2], [12]. Instead of simply assigning a 0 (absence of motion) or a 1 (presence of motion) to each label, we can assign, without loss of generality, a 0 (absence of motion), 1 (low level of motion) or 2 (high level of motion) to describe the level of motion in different regions of interest at a certain moment in time.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Simultaneous and Spatiotemporal Detection of Different Levels of Activity in Multidimensional Data

et al. 2020

View full text Add to dashboard Cite

In this work, we present a novel and promising approach to autonomously detect different levels of simultaneous and spatiotemporal activity in multidimensional data. We introduce a new multilabeling technique, which assigns different labels to different regions of interest in the data, and thus, incorporates the spatial aspect. Each label is built to describe the level of activity/motion to be monitored in the spatial location that it represents, in contrast to existing approaches providing only a binary result as the presence or absence of activity. This novel Spatially and Motion-Level Descriptive (SMLD) labeling schema is combined with a Convolutional Long Short Term Memory-based network for classification to capture different levels of activity both spatially and temporally without the use of any foreground or object detection. The proposed approach can be applied to various types of spatiotemporal data captured for completely different application domains. In this paper, it was evaluated on video data as well as respiratory sound data. Metrics commonly associated with multilabeling, namely Hamming Loss and Subset Accuracy, as well as confusion matrix-based measurements are used to evaluate performance. Promising testing results are achieved with an overall Hamming Loss for video datasets close to 0.05, Subset Accuracy close to 80% and confusion matrix-based metrics above 0.9. In addition, our proposed approach's ability in detecting frequent motion patterns based on predicted spatiotemporal activity levels is discussed. Encouraging results have been obtained on the respiratory sound dataset as well, while detecting abnormalities in different parts of the lungs. The experimental results demonstrate that the proposed approach can be applied to various types of spatiotemporal data captured for different application domains.

show abstract

Multi-label Class-imbalanced Action Recognition in Hockey Videos via 3D Convolutional Neural Networks

Cited by 47 publications

References 20 publications

Scene Classification for Sports Video Summarization Using Transfer Learning

Scene Classification for Sports Video Summarization Using Transfer Learning

Deep learning in sport video analysis: a review

Simultaneous and Spatiotemporal Detection of Different Levels of Activity in Multidimensional Data

Contact Info

Product

Resources

About