Dense Dilated Network for Video Action Recognition

Xu, Baohan; Ye, Hao; Zheng, Yingbin; Wang, Heng; Luwang, Tianyu; Jiang, Yu‐Gang

doi:10.1109/tip.2019.2917283

Cited by 33 publications

(7 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…e simulation test shows that the recognition algorithm can accurately determine the damage location and damage degree, and the result is stable. A sports video recognition model combining multiple features and NN was proposed in [18,19]. It extracts the static and dynamic features that reflect the sports video, classifies them separately using the RBF neural network, constructs the basic probability assignment of the preliminary recognition results, and uses the evidence theory to fuse the preliminary results to obtain the sports video recognition results.…”

Section: Related Workmentioning

confidence: 99%

Sports Action Recognition Based on Particle Swarm Optimization Neural Networks

Zhang

2022

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

Video acquisition has become more convenient as science and technology have progressed, and the development of mobile Internet has resulted in a large amount of video data being generated every day. The question of how to analyze these videos automatically has become urgent. Among them, the study of sports movement recognition in video has important theoretical implications in sports research as well as practical application value. This paper proposes a PSO-NN-based sports action recognition model. Kernel principal component analysis is used to extract and analyze the characteristics of sports movements. The improved neural network is used to identify common human postures in sports, and the classification and block background estimation method is used to detect human targets. The feature extraction of targets is completed according to the edge features, and the feature extraction of targets is completed according to the edge features. Finally, the feature vectors are trained using a backpropagation neural network (BPNN), and the parameters of the BPNN are chosen using the PSO algorithm to create a classifier for sports action recognition. The results show that this model improves the accuracy of sports video recognition and is an effective method of sports action recognition when compared to the comparison model.

show abstract

Section: Related Workmentioning

confidence: 99%

Sports Action Recognition Based on Particle Swarm Optimization Neural Networks

Zhang

2022

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

show abstract

“…Numerous studies have thus far been conducted in the field of computer vision on humanaction recognition using video data sets, and state-of-the-art accuracy is being updated with new studies reported frequently [30][31][32][33] ; however, these technologies have not been widely applied to the field of surgery. To our knowledge, this is the first study to use 3-D CNN for actual laparoscopic surgical skill assessment.…”

Section: Discussionmentioning

confidence: 99%

Development and Validation of a 3-Dimensional Convolutional Neural Network for Automatic Surgical Skill Assessment Based on Spatiotemporal Video Analysis

et al. 2021

View full text Add to dashboard Cite

IMPORTANCE A high level of surgical skill is essential to prevent intraoperative problems. One important aspect of surgical education is surgical skill assessment, with pertinent feedback facilitating efficient skill acquisition by novices.OBJECTIVES To develop a 3-dimensional (3-D) convolutional neural network (CNN) model for automatic surgical skill assessment and to evaluate the performance of the model in classification tasks by using laparoscopic colorectal surgical videos. DESIGN, SETTING, AND PARTICIPANTSThis prognostic study used surgical videos acquired prior to 2017. In total, 650 laparoscopic colorectal surgical videos were provided for study purposes by the Japan Society for Endoscopic Surgery, and 74 were randomly extracted. Every video had highly reliable scores based on the Endoscopic Surgical Skill Qualification System (ESSQS, range 1-100, with higher scores indicating greater surgical skill) established by the society. Data were analyzed June to December 2020. MAIN OUTCOMES AND MEASURESFrom the groups with scores less than the difference between the mean and 2 SDs, within the range spanning the mean and 1 SD, and greater than the sum of the mean and 2 SDs, 17, 26, and 31 videos, respectively, were randomly extracted. In total, 1480 video clips with a length of 40 seconds each were extracted for each surgical step (medial mobilization, lateral mobilization, inferior mesenteric artery transection, and mesorectal transection) and separated into 1184 training sets and 296 test sets. Automatic surgical skill classification was performed based on spatiotemporal video analysis using the fully automated 3-D CNN model, and classification accuracies and screening accuracies for the groups with scores less than the mean minus 2 SDs and greater than the mean plus 2 SDs were calculated. RESULTSThe mean (SD) ESSQS score of all 650 intraoperative videos was 66.2 (8.6) points and for the 74 videos used in the study, 67.6 (16.1) points. The proposed 3-D CNN model automatically classified video clips into groups with scores less than the mean minus 2 SDs, within 1 SD of the mean, and greater than the mean plus 2 SDs with a mean (SD) accuracy of 75.0% (6.3%). The highest accuracy was 83.8% for the inferior mesenteric artery transection. The model also screened for the group with scores less than the mean minus 2 SDs with 94.1% sensitivity and 96.5% specificity and for group with greater than the mean plus 2 SDs with 87.1% sensitivity and 86.0% specificity. CONCLUSIONS AND RELEVANCEThe results of this prognostic study showed that the proposed 3-D CNN model classified laparoscopic colorectal surgical videos with sufficient accuracy to be used for screening groups with scores greater than the mean plus 2 SDs and less than the mean minus 2 SDs. The proposed approach was fully automatic and easy to use for various types of surgery, and no (continued) Key Points Question Is it possible to apply deep learning-based spatiotemporal video analysis using a 3-dimensional convolutional neural network to automate surgical skill ...

show abstract

“…Dilation convolution [25] in action recognition usually adopted to model temporal features and extract larger contextual information. In [26], a dense dilated network was trained to recognize actions from clip-level to global-level, by fusing outputs from each densely-connected dilated convolutions layer. In temporal aggregation network (TAN) [27], a dedicated temporal aggregation block was designed to encode multi-scale spatio-temporal patterns, and larger temporal context can be captured by dilated convolutions effectively.…”

Section: Dilation Convolution Networkmentioning

confidence: 99%