Optimization of Action Recognition Model Based on Multi-Task Learning and Boundary Gradient

Xu, Yiming; Zhou, Fangjie; Wang, Li; Peng, Wei; Zhang, Kai

doi:10.3390/electronics10192380

Cited by 5 publications

(4 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They applied a unique-match restriction among the descriptors and eliminated matches that were too far apart from one another to assure correctness. For picture classification, dense sampling has proven to perform better than sparse interest spots (4) . Similarly, dense sampling at predictable geographical and temporal places fared better than cutting-edge space-time interest point detectors in recent assessments of action identification.…”

Section: Methodsmentioning

confidence: 99%

Human Action Recognition Using Dense Trajectories

Labana,

Modi

2023

IJST

View full text Add to dashboard Cite

Objective: To develop a robust and effective computer vision system that can automatically identify and classify human actions in video data, considering the temporal dynamics and various environmental conditions. This technology has numerous applications in surveillance, human-computer interaction, and video analysis. Methods: The key methods for dense trajectory extraction include the dense optical flow, which computes motion vectors for each point, and the use of key point detectors like the Scale-Invariant Feature Transform (SIFT) or the Harris corner detector. Findings: By describing the motion of the trajectories, trajectory descriptors produce remarkably strong results on their own, such as 90.2% on KTH and 47.7% on Hollywood2 for dense trajectories. This demonstrates the significance of the motion data present in the local trajectory patterns. Because the trajectory descriptors catch a lot of camera motion, we only report 67.2% on YouTube. Novelty: In this study, a method for modelling movies that combines dense sampling and feature tracking is presented. Compared to earlier video descriptions, our dense trajectories are more reliable. They effectively capture the motion data in the movies and outperform cutting-edge action categorization techniques in terms of performance.

show abstract

Section: Methodsmentioning

confidence: 99%

Human Action Recognition Using Dense Trajectories

Labana,

Modi

2023

IJST

View full text Add to dashboard Cite

show abstract

“…[16] uses human boxes and key points to represent instance-level features, and the action region features of this framework are used as the input of the temporal action head network, which makes the framework more discriminative. The author of [17] proposed a multi-scale feature extraction method used to extract richer feature information. At the same time, a multi-task learning model is introduced.…”

Section: Action Recognitionmentioning

confidence: 99%

Temporal Context Modeling Network with Local-Global Complementary Architecture for Temporal Proposal Generation

Yuan

Yang²,

Luo³

et al. 2022

Electronics

View full text Add to dashboard Cite

Temporal Action Proposal Generation (TAPG) is a promising but challenging task with a wide range of practical applications. Although state-of-the-art methods have made significant progress in TAPG, most ignore the impact of the temporal scales of action and lack the exploitation of effective boundary contexts. In this paper, we propose a simple but effective unified framework named Temporal Context Modeling Network (TCMNet) that generates temporal action proposals. TCMNet innovatively uses convolutional filters with different dilation rates to address the temporal scale issue. Specifically, TCMNet contains a BaseNet with dilated convolutions (DBNet), an Action Completeness Module (ACM), and a Temporal Boundary Generator (TBG). The DBNet aims to model temporal information. It handles input video features through different dilated convolutional layers and outputs a feature sequence as the input of ACM and TBG. The ACM aims to evaluate the confidence scores of densely distributed proposals. The TBG is designed to enrich the boundary context of an action instance. The TBG can generate action boundaries with high precision and high recall through a local–global complementary structure. We conduct comprehensive evaluations on two challenging video benchmarks: ActivityNet-1.3 and THUMOS14. Extensive experiments demonstrate the effectiveness of the proposed TCMNet on tasks of temporal action proposal generation and temporal action detection.

show abstract

“…Convolutional neural networks (CNNs) are nowadays the state-of-the-art methods for a wide range of computer vision tasks, thanks to the large-scale public datasets [1][2][3] and high performance accelerators like graphical processing units (GPUs). For example, CNNs rank the highest in benchmarks in image classification [4][5][6], object detection [7][8][9][10], semantic segmentation [11], and action recognitions [12][13][14]. The general recipe for a successful CNN model includes training a large-sized model with a large-scale dataset.…”

Section: Introductionmentioning

confidence: 99%

“…Collecting a large-scale dataset is also very expensive. To solve these issues, previous works have proposed multi-task learning (MTL) [12,[15][16][17]. By definition, multi-task learning trains a model with multiple functionalities.…”

Section: Introductionmentioning

confidence: 99%

Multi-Task Learning with Task-Specific Feature Filtering in Low-Data Condition

Lee

Seo

et al. 2021

Electronics

View full text Add to dashboard Cite

Multi-task learning is a computationally efficient method to solve multiple tasks in one multi-task model, instead of multiple single-task models. MTL is expected to learn both diverse and shareable visual features from multiple datasets. However, MTL performances usually do not outperform single-task learning. Recent MTL methods tend to use heavy task-specific heads with large overheads to generate task-specific features. In this work, we (1) validate the efficacy of MTL in low-data conditions with early-exit architectures, and (2) propose a simple feature filtering module with minimal overheads to generate task-specific features. We assume that, in low-data conditions, the model cannot learn useful low-level features due to the limited amount of data. We empirically show that MTL can significantly improve performances in all tasks under low-data conditions. We further optimize the early-exit architecture by a sweep search on the optimal feature for each task. Furthermore, we propose a feature filtering module that selects features for each task. Using the optimized early-exit architecture with the feature filtering module, we improve the 15.937% in ImageNet and 4.847% in Places365 under the low-data condition where only 5% of the original datasets are available. Our method is empirically validated in various backbones and various MTL settings.

show abstract

Optimization of Action Recognition Model Based on Multi-Task Learning and Boundary Gradient

Cited by 5 publications

References 26 publications

Human Action Recognition Using Dense Trajectories

Human Action Recognition Using Dense Trajectories

Temporal Context Modeling Network with Local-Global Complementary Architecture for Temporal Proposal Generation

Multi-Task Learning with Task-Specific Feature Filtering in Low-Data Condition

Contact Info

Product

Resources

About