Action recognition is an important research direction in computer vision, which has worldwide applications, such as video surveillance, human-robot interaction and so on. Due to the influence of complex background and multi-angle changes, accurate recognition and analysis of human motion in real-life scenarios is still a challenging problem. In order to improve the accuracy of pedestrian detection and motion recognition, this paper proposes a novel edge-aware end-to-end deep network method, which uses the edge-aware pooling module to improve pedestrian contour accuracy and captures video sequences using multi-scale pyramid pooling layer spatial-time context feature. The complementary features of the edge-related features can effectively preserve the clear boundary, and the combination of the auxiliary side output and the pyramid pooling layer output can extract rich global context information. A large number of qualitative and quantitative experimental results show that the proposed model can effectively improve the performance of existing pedestrian detection and motion recognition networks on the UCF-101, HMDB-51, and KTH dataset. INDEX TERMS Motion recognition, edge perception, deep learning, pyramid pooling, spatial-temporal context.