The main goal of this thesis is to study the interaction between the subject and the object within MIR (Mixed Interactive Reality) technologies from a therapeutic perspective. To see how this interaction works, I first consider the general structures of DMI (Digital Musical Interface) technologies, then look at the recently-developed motion capture system (MOCAP), and end by proposing six technologies of relevance to this context. These interactive technologies are significant in the HCI (Human Computer Interface) scenario, from both a historical and an applicative point of view. The first and second parts of this paper serve to introduce the theoretical aspects of the interaction. All the technologies described are grounded in the interaction between the action and the subject's perception of it. In the third part of the thesis, I describe proprioception, shaped by receptors that are fundamental both to movement and to self-awareness. I thus investigate the perceptive dimension of the interaction, referring to some fundamental theories on the philosophy of the body. This is useful for the pupose of taking a holistic perspective, based on the consideration of the human being as the subject, and the technological environment as the object. I then first discuss Presence, a concept acquired by other authors that I revisit within the interactive environment, which becomes a second "presence" interacting with the user. Then I introduce Gibson's theory of visual perception, lingering on the concept of Affordance, which refers to the characteristics of the environment offered to the subject. I continue with the notion of IS (Image Schemata) and Johnson's metaphor, which is functional to the comprehension of the MIR environment. The Sensorimotor Contingency Theory (SCT) advanced by O'Regan and Nöe is also helpful for elucidating how sensorimotor contingencies are involved in the perceptive process, within an interactive system. Finally, the Philosophy of Empowerment, interwoven with the aforementioned concepts, includes the experience and the perception that occurs with full control over our actions during the interaction. Bearing this theoretical basis in mind, we can understand the cross-modal interaction that takes place in a holistic view of the interaction between subject and object, in more than one sensory modality at a time. So it seems that images and sounds take part in this sensory integration that the user can experience within a MIR.