Music therapy has shown efficacy in serious and chronic conditions, mental disorders, and disabilities. However, there is still much to explore regarding the mechanisms through which music interventions exert their effects. A typical session involves interactions between the therapist, the client and the musical work itself, and to help address the challenges of capturing and comprehending its dynamics, we extend our general computational paradigm (CP) for analyzing the expressive and social behavioral processes in arts therapies. The extension includes bodily and non-verbal aspects of the behavior, offering additional insights into the client’s emotional states and engagement. We have used this version of the CP, which employs AI pose estimation technology, image processing and audio analysis, to capture therapy-related psychometrics and their intra- and inter-session analysis. The CP is applied in a real world proof-of-concept study, and the results enable us to pinpoint meaningful events and emergent properties not captured by the human eye, complementing the therapist’s interpretations. The resulting data may also be useful in other scientific and clinical areas.