Human action recognition and posture prediction aim to recognize and predict respectively the action and postures of persons in videos. They are both active research topics in computer vision community, which have attracted considerable attention from academia and industry. They are also the precondition for intelligent interaction and human-computer cooperation, and they help the machine perceive the external environment. In the past decade, tremendous progress has been made in the field, especially after the emergence of deep learning technologies.Hence, it is necessary to make a comprehensive review of recent developments. In this paper, firstly, we attempt to present the background, and then discuss research progresses. Secondly, we introduce datasets, various typical feature representation methods, and explore advanced human action recognition and posture prediction algorithms.Finally, facing the challenges in the field, this paper puts forward the research focus, and introduces the importance of action recognition and posture prediction by taking interactive cognition in self-driving vehicle as an example.
Vehicle-to-environment interactionResearches on Multi-agent Reinforcement Learning(MARL)
Collaborative behavior between vehicles
Sensitive to model parameters Decision risk caused by the uncertainty
Key scientific issues
Main research contents
Distributed partial observation Markova decision processPrototype system Competitive scenarios V2V Vehicle-to-vehicle interaction Robust MARL Risk-aversion MARL All-weather types Diverse road types Complex environment Sensing task for traffic lights Sensing task for traffic lights Stop before the stopline Keep current state Joint evaluation of task difficulty and vehicle performance
<abstract><p>In the control of the self-driving vehicles, PID controllers are widely used due to their simple structure and good stability. However, in complex self-driving scenarios such as curvature curves, car following, overtaking, etc., it is necessary to ensure the stable control accuracy of the vehicles. Some researchers used fuzzy PID to dynamically change the parameters of PID to ensure that the vehicle control remains in a stable state. It is difficult to ensure the control effect of the fuzzy controller when the size of the domain is not selected properly. This paper designs a variable-domain fuzzy PID intelligent control method based on Q-Learning to make the system robust and adaptable, which is dynamically changed the size of the domain to further ensure the control effect of the vehicle. The variable-domain fuzzy PID algorithm based on Q-Learning takes the error and the error rate of change as input and uses the Q-Learning method to learn the scaling factor online so as to achieve online PID parameters adjustment. The proposed method is verified on the Panosim simulation platform.The experiment shows that the accuracy is improved by 15% compared with the traditional fuzzy PID, which reflects the effectiveness of the algorithm.</p></abstract>
Human action recognition has attracted extensive research efforts in recent years, in which traffic police gesture recognition is important for self-driving vehicles. One of the crucial challenges in this task is how to find a representation method based on spatial-temporal features. However, existing methods performed poorly in spatial and temporal information fusion, and how to extract features of traffic police gestures has not been well researched. This paper proposes an attention mechanism based on the improved spatial-temporal convolutional neural network (AMSTCNN) for traffic police gesture recognition. This method focuses on the action part of traffic police and uses the correlation between spatial and temporal features to recognize traffic police gestures, so as to ensure that traffic police gesture information is not lost. Specifically, AMSTCNN integrates spatial and temporal information, uses weight matching to pay more attention to the region where human action occurs, and extracts region proposals of the image. Finally, we use Softmax to classify actions after spatial-temporal feature fusion. AMSTCNN can strongly make use of the spatial-temporal information of videos and select effective features to reduce computation. Experiments on AVA and the Chinese traffic police gesture datasets show that our method is superior to several state-of-the-art methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.