The analysis of human activities is one of the most intriguing and important open issues for the automated video surveillance community. Since few years ago, it has been handled following a mere Computer Vision and Pattern Recognition perspective, where an activity corresponded to a temporal sequence of explicit actions (run, stop, sit, walk, etc.). Even under this simplistic assumption, the issue is hard, due to the strong diversity of the people appearance, the number of individuals considered (we may monitor single individuals, groups, crowd), the variability of the environmental conditions (indoor/outdoor, different weather conditions), and the kinds of sensors employed. More recently, the automated surveillance of human activities has been faced considering a new perspective, that brings in notions and principles from the social, affective, and psychological literature, and that is called Social Signal Processing (SSP). SSP employs primarily nonverbal cues, most of them are outside of conscious awareness, like face expressions and gazing, body posture and gestures, vocal characteristics, relative distances in the space and the like. This paper is the first review analyzing this new trend, proposing a structured snapshot of the state of the art and envisaging novel challenges in the surveillance domain where the cross-pollination of Computer Science technologies and Sociology theories may offer valid investigation strategies.