2021
DOI: 10.48550/arxiv.2106.14447
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Feature Combination Meets Attention: Baidu Soccer Embeddings and Transformer based Temporal Detection

Xin Zhou,
Le Kang,
Zhiyu Cheng
et al.

Abstract: With rapidly evolving internet technologies and emerging tools, sports related videos generated online are increasing at an unprecedentedly fast pace. To automate sports video editing/highlight generation process, a key task is to precisely recognize and locate the events in the long untrimmed videos. In this tech report, we present a two-stage paradigm to detect what and when events happen in soccer broadcast videos. Specifically, we fine-tune multiple action recognition models on soccer data to extract high-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(15 citation statements)
references
References 19 publications
0
15
0
Order By: Relevance
“…The baseline performance correspond to last year's winner [32]. As shown in Table 2, this year's winning method significantly improved the spotting performance for tight intervals in both the challenge and test sets.…”
Section: Resultsmentioning
confidence: 89%
“…The baseline performance correspond to last year's winner [32]. As shown in Table 2, this year's winning method significantly improved the spotting performance for tight intervals in both the challenge and test sets.…”
Section: Resultsmentioning
confidence: 89%
“…In Table 4, we compare E2E-Spot to the best results from the CVPR 2021 (lenient tolerances of 5-60 sec) and CVPR 2022 (less coarse, 1-5 sec tolerances) SoccerNet Action Spotting challenges [14]. E2E-Spot, with the 200MF CNN, matches the top prior method from the 2021 competition [75] in the 5-60 sec setting while outperforming it by 13.7-14.1 avg-mAP points in the less coarse, 1-5 sec setting. Increasing the CNN to 800MF improves avg-mAP slightly (by 0.4-2.7 avg-mAP).…”
Section: Results On the Soccernet Action Spotting Challengementioning
confidence: 99%
“…E2E-Spot places second in the (concurrent) 2022 competition (within 1.1 avg-mAP), after Soares et al [54], due to the latter's strong performance on unshown actions (not visible in the frame). Soares et al [54,55] and Zhou et al [75] are two-phase approaches, combining pre-extracted features from multiple (5 to 6) heterogeneous, fine-tuned feature extractors and proposing downstream architectures and losses on those features. In contrast, E2E-Spot shows that direct, end-to-end training of a simple and compact model can be a surprisingly strong baseline.…”
Section: Results On the Soccernet Action Spotting Challengementioning
confidence: 99%
See 2 more Smart Citations