Temporal action proposal generation aims to localize temporal segments of human activities in videos. Current boundary-based proposal generation methods can generate proposals with precise boundary, but often suffer from the inferior quality of confidence scores used for proposal retrieving. In this paper, we propose an effective and end-to-end action proposal generation method, named ProposalVLAD with Proposal-Intra Exploring Network (PVPI-Net). We first propose a ProposalVLAD module to dynamically generate global features of the entire video, then we combine the global features and proposal local features to generate the final feature representations for all candidate proposals. Then, we design a novel Proposal-Intra Loss function (PI-Loss) to generate more reliable proposal confidence scores. Extensive experiments on large-scale and challenging datasets demonstrate the effectiveness of our proposed method. Experimental results show that our PVPI-Net achieves significant improvements on two benchmark datasets (
i.e.
, THUMOS’14 and ActivityNet-1.3), and sets new records for temporal action detection task.