Human communication entails subtle non-verbal modes of expression, which can be analyzed quantitatively using computational approaches and thus support human sciences. In this paper we present huSync, a computational framework and system that utilizes trajectory information extracted using pose estimation algorithms from video sequences to quantify synchronization between individuals in small groups. The system is exploited to study interpersonal coordination in musical ensembles. Musicians communicate with each other through sounds and gestures, providing nonverbal cues that regulate interpersonal coordination. huSync was applied to recordings of concert performances by a professional instrumental ensemble playing two musical pieces. We examined effects of different aspects of musical structure (texture and phrase position) on interpersonal synchronization, which was quantified by computing phase locking values of head motion for all possible within-group pairs. Results indicate that interpersonal coupling was stronger for polyphonic textures (ambiguous leadership) than homophonic textures (clear melodic leader), and this difference was greater in early portions of phrases than endings (where coordination demands are highest). Results were cross-validated against an analysis of audio features, showing links between phase locking values and event density. This research produced a system, huSync, that can quantify synchronization in small groups and is sensitive to dynamic modulations of interpersonal coupling related to ambiguity in leadership and coordination demands, in standard video recordings of naturalistic human group interaction. huSync enabled a better understanding of the relationship between interpersonal coupling and musical structure, thus enhancing collaborations between human and computer scientists.
In small musical groups, performers can seem to coordinate their movements almost effortlessly in remarkable exhibits of joint action and entrainment. To achieve a common musical goal, co-performers interact and communicate using non-verbal means such as upper-body movements, and particularly head motion. Studying these phenomena in naturalistic contexts can be challenging since most techniques make use of motion capture technologies that can be intrusive and costly. To investigate an alternative method, we analyze video recordings of a professional instrumental ensemble by extracting trajectory information using pose estimation algorithms. We examine Kansei perspectives such as the analysis of non-verbal expression conveyed by bodily movements and gestures, and test for causal relationships and directed influence between performers using the Granger Causality method. We compute weighted probabilities representing the likelihood that each performer Granger Causes co-performers’ movements. Effects of different aspects of musical textures were examined and results indicated stronger directionality for homophonic textures (clear melodic leader) than polyphonic (ambiguous leadership).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.