In this paper, a deep confidence neural network algorithm is used to design and deeply analyze the risk warning model for stadium operation. Many factors, such as video shooting angle, background brightness, diversity of features, and the relationship between human behaviors, make feature attribute-based behavior detection a focus of researchers’ attention. To address these factors, researchers have proposed a method to extract human behavior skeleton and optical flow feature information from videos. The key of the deep confidence neural network-based recognition method is the extraction of the human skeleton, which extracts the skeleton sequence of human behavior from a surveillance video, where each frame of the skeleton contains 18 joints of the human skeleton and the confidence value estimated for each frame of the skeleton, and builds a deep confidence neural network model to classify the dangerous behavior based on the obtained skeleton feature information combined with the time vector in the skeleton sequence and determine the danger level of the behavior by setting the corresponding threshold value. The deep confidence neural network uses different feature information compared with the spatiotemporal graph convolutional network. The deep confidence neural network establishes the deep confidence neural network model based on the human optical flow information, combined with the temporal relational inference of video frames. The key of the temporal relationship network-based recognition method is to extract some frames from the video in an orderly or random way into the temporal relationship network. In this paper, we use several methods for comparison experiments, and the results show that the recognition method based on skeleton and optical flow features is significantly better than the algorithm of manual feature extraction.