TEDdet: Temporal Feature Exchange and Difference Network for Online Real-Time Action Detection

Liu, Yu; Yang, Fan; Ginhac, Dominique

doi:10.1109/access.2022.3164730

Cited by 2 publications

(1 citation statement)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a subfield of video action understanding, video action recognition [2] involves only classifying the action categories in video clips, while temporal action detection [3]- [5], [8] focuses on identifying the start and end times of actions within video clips. In contrast, the difficulty of spatio-temporal action detection [6], [7], [38], [39] surpasses these tasks as it requires simultaneously determining the spatial and temporal locations of action instances in the video. Spatio-temporal action detection typically follows a two-step strategy: detection and linking.…”

Section: Introductionmentioning

confidence: 99%

Online Hierarchical Linking of Action Tubes for Spatio-Temporal Action Detection Based on Multiple Clues

Su,

Zhang

2024

IEEE Access

View full text Add to dashboard Cite

The spatio-temporal action detection task requires the output of the temporal and spatial positions as well as the action category of target action instances in the form of action tubes. However, the current definition of video-level metrics in spatio-temporal action detection tasks is not clear and unified enough to fully describe the ability of network models to perform spatio-temporal detection. Furthermore, existing tube linking methods are not only heavily dependent on the quality of the detection stage but also lack reliable linking criteria, resulting in poor tube linking performance. To address these issues, this study proposes a hierarchical linking method based on multiple clues, abbreviated as MCHL. This method first dynamically utilizes various correlation clues at two levels, including appearance features, spatial overlap, motion prediction, category scores, tube length, and tube confidence status, to reduce the negative impact of unreliable information on correlation. Then, it employs inter-class correlation to handle the mutual influence between different categories, followed by using joint probability data allocation to address the mutual influence between correlated objects, ultimately achieving robust and accurate online linking of action tubes. The method is experimentally compared with other correlation methods on the untrimmed UCF24 and MultiSports datasets, demonstrating state-of-the-art tube link performance. We also conduct ablation experiments to explore the impact of different modules and stages in the proposed tube linking method.INDEX TERMS MCHL, spatio-temporal action detection, linking method, untrimmed video mAP.

show abstract