2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2020
DOI: 10.1109/cvprw50498.2020.00456
|View full text |Cite
|
Sign up to set email alerts
|

Improved Soccer Action Spotting using both Audio and Video Streams

Abstract: In this paper, we propose a study on multi-modal (audio and video) action spotting and classification in soccer videos. Action spotting and classification are the tasks that consist in finding the temporal anchors of events in a video and determine which event they are. This is an important application of general activity understanding. Here, we propose an experimental study on combining audio and video information at different stages of deep neural network architectures. We used the SoccerNet benchmark datase… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
22
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 29 publications
(22 citation statements)
references
References 69 publications
0
22
0
Order By: Relevance
“…Another issue observed both in our results and in various SoccerNet papers (e.g., [1,4,73,80]) is that the input window greatly affects performance. A larger window yields more data to analyze and delays inference prediction in any live event scenario as one must wait for the window to be available and processed.…”
Section: Discussionmentioning
confidence: 58%
See 3 more Smart Citations
“…Another issue observed both in our results and in various SoccerNet papers (e.g., [1,4,73,80]) is that the input window greatly affects performance. A larger window yields more data to analyze and delays inference prediction in any live event scenario as one must wait for the window to be available and processed.…”
Section: Discussionmentioning
confidence: 58%
“…New models are currently being developed and tested. For example, in the SoccerNet-v2 challenge [80], the best dataset benchmark models, CALF [4], AudioVid [73] and NetVLAD [1], have achieved average-mAP values of 72.2%, 69.7%, and 54.9% for goal events, respectively. Moreover, Zhou et al [69] also achieved good results in the SoccerNet-v2 spotting challenge as the winning team, with an overall average-mAP of about 75% (the authors do not provide event-specific numbers).…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…In sports analytics, many computer vision technologies are developed to understand sports broadcasts [15]. Specifically in soccer, researchers propose algorithms to detect players on field in real time [2], analyze pass feasibility using player's body orientation [1], incorporate both audio and video streams to detect events [17], recognize group activities on the field using broadcast stream and trajectory data [14], aggregate deep frame features to spot major game events [8], and leverage the temporal context information around the actions to handle the intrinsic temporal patterns representing these actions [3,9].…”
Section: Related Workmentioning
confidence: 99%