It has been suggested that the brain controls hand movements via internal models that rely on visual and proprioceptive cues about the state of the hand. In active inference formulations of such models, the relative influence of each modality on action and perception is determined by how precise (reliable) it is expected to be. The 'top-down' affordance of expected precision to a particular sensory modality is associated with attention. Here, we asked whether increasing attention to (i.e., the precision of) vision or proprioception would enhance performance in a hand-target phase matching task, in which visual and proprioceptive cues about hand posture were incongruent. We show that in a simple simulated agent-based on predictive coding formulations of active inference-increasing the expected precision of vision or proprioception improved task performance (target matching with the seen or felt hand, respectively) under visuo-proprioceptive conflict. Moreover, we show that this formulation captured the behaviour and self-reported attentional allocation of human participants performing the same task in a virtual reality environment. Together, our results show that selective attention can balance the impact of (conflicting) visual and proprioceptive cues on action-rendering attention a key mechanism for a flexible body representation for action.Controlling the body's actions in a constantly changing environment is one of the most important tasks of the human brain. The brain solves the complex computational problems inherent in this task by using internal probabilistic (Bayes-optimal) models 1-6 . These models allow the brain to flexibly estimate the state of the body and the consequences of its movement, despite noise and conduction delays in the sensorimotor apparatus, via iterative updating by sensory prediction errors from multiple sources. The state of the hand, in particular, can be informed by vision and proprioception. Here, the brain makes use of an optimal integration of visual and proprioceptive signals, where the relative influence of each modality-on the final 'multisensory' estimate-is determined by its relative reliability or precision, depending on the current context 6-15 .These processes can be investigated under an experimentally induced conflict between visual and proprioceptive information. The underlying rationale here is that incongruent visuo-proprioceptive cues about hand position or posture have to be integrated (provided the incongruence stays within reasonable limits), because the brain's body model entails a strong prior belief that information from both modalities is generated by one and the same external cause; namely, one's hand. Thus, a partial recalibration of one's unseen hand position towards the position of a (fake or mirror-displaced) hand seen in an incongruent position has been interpreted as suggesting an (attempted) resolution of visuo-proprioceptive conflict to maintain a prior body representation 16-23 .Importantly, spatial or temporal perturbations can be introduced to vi...