Current methods of human activity recognition face many challenges, such as the need for multiple sensors, poor implementation, unreliable real-time performance, and lack of temporal location. In this research, we developed a method for recognizing and locating human activities based on temporal action recognition. For this work, we used a multilayer convolutional neural network (CNN) to extract features. In addition, we used refined actionness grouping to generate precise region proposals. Then, we classified the candidate regions by employing an activity classifier based on a structured segmented network and a cascade design for end-to-end training. Compared with previous methods of action classification, the proposed method adds the time boundary and effectively improves the detection accuracy. To test this method empirically, we conducted experiments utilizing surveillance video of an offshore oil production plant. Three activities were recognized and located in the untrimmed long video: standing, walking, and falling. The accuracy of the results proved the effectiveness and real-time performance of the proposed method, demonstrating that this approach has great potential for practical application.