Abstract-Understanding human behaviors is a challenging problem in computer vision that has recently seen important advances. Human behavior understanding combines image and signal processing, feature extraction, machine learning and 3D geometry. Application scenarios range from surveillance to indexing and retrieval, from patient care to industrial safety and sports analysis. Given the broad set of techniques used in videobased behavior understanding and the fast progress in this area, in this paper we organize and survey the corresponding literature, define unambiguously key terms and discuss links among fundamental building blocks ranging from human detection to action and interaction recognition. The advantages and the drawbacks of the methods are critically discussed, providing a comprehensive coverage of key aspects of video-based human behavior understanding, available datasets for experimentation and comparisons, and important open research issues.