Recognizing instances at varying scales simultaneously is a fundamental challenge in visual detection problems. While spatial multiscale modeling has been well studied in object detection, how to effectively apply a multi-scale architecture to temporal models for activity detection is still under-explored. In this paper, we identify three unique challenges that need to be specifically handled for temporal activity detection. To address all these issues, we propose Dynamic Temporal Pyramid Network (DTPN), a new activity detection framework with a multiscale pyramidal architecture featuring three novel designs: (1) We sample frame sequence dynamically with different frame per seconds (FPS) to construct a natural pyramidal representation for arbitrary-length input videos. (2) We design a two-branch multi-scale temporal feature hierarchy to deal with the inherent temporal scale variation of activity instances. (3) We further exploit the temporal context of activities by appropriately fusing multi-scale feature maps, and demonstrate that both local and global temporal contexts are important. By combining all these components into a uniform network, we end up with a singleshot activity detector involving single-pass inferencing and end-to-end training. Extensive experiments show that the proposed DTPN achieves state-of-the-art performance on the challenging ActvityNet dataset.