The study of nonverbal behavior (NVB), and in particular kinesics (i.e., face and body motions), is typically seen as cost-intensive. However, the development of new technologies (e.g., ubiquitous sensing, computer vision, and algorithms) and approaches to study social behavior [i.e., social signal processing (SSP)] makes it possible to train algorithms to automatically code NVB, from action/motion units to inferences. Nonverbal social sensing refers to the use of these technologies and approaches for the study of kinesics based on video recordings. Nonverbal social sensing appears as an inspiring and encouraging approach to study NVB at reduced costs, making it a more attractive research field. However, does this promise hold? After presenting what nonverbal social sensing is and can do, we discussed the key challenges that researchers face when using nonverbal social sensing on video data. Although nonverbal social sensing is a promising tool, researchers need to be aware of the fact that algorithms might be as biased as humans when extracting NVB or that the automated NVB coding might remain context-dependent. We provided study examples to discuss these challenges and point to potential solutions.