“…T EMPORAL action localization (TAL) is an important visual task with numerous applications (e.g., anomaly detection [1] and video retrieval [2]) and has witnessed remarkable progress in the fully-supervised setting [3]. To bypass the tedious manual annotations of action boundaries, video-level weakly-supervised TAL methods [4]- [6] has draw increasing attention which localizes actions with only videolevel class labels. However, due to the absent of explicit location supervision, they suffer from action-background confusion and drop largely behind the fully-supervised methods.…”