Across different epochs and societies, humans occasionally gather to jointly make music. This universal form of collective behavior is as fascinating as it is poorly understood. As the interest in joint music making (JMM) rapidly grows, we review the state-of-the-art of this emerging science, blending behavioral, neural, and computational contributions. We present a theoretical framework synthesizing cross-field research on JMM within four components. The framework is centered upon interpersonal coordination, a crucial requirement for JMM. The other components imply the influence of individuals’ (past) experience, (current) social traits, and (future) goals on real-time coordination. Our work aims to promote the development of JMM research by organizing existing work, inspiring new questions, and fostering comparability with other research communities.