According to the thesis of natural mindreading (NMRT), mindreading-i.e., the capacity to attribute mental states to predict and explain behavior-is an intrinsic component of the human biological endowment, thus being innately specified by natural selection within particular neurocognitive structures. In this article, we challenge the NMRT as a phylogenetic and ontogenetic account of the development of the socio-cognitive capacities of our species. In detail, we argue that basic capacities of social cognition (e.g., the traces of early systems of bodily ornamentation within the archeological record, and infants' selective attention at others' beliefs in spontaneous-response false belief tasks) do not involve meta-representational mindreading but are better explained by appealing to situated embodied capacities acquired in social interaction. While we acknowledge that more flexible capacities of social cognition (e.g., those implied by the use of political emblems in industrialized societies, or by 4-year-olds' success in elicitedresponse false belief tasks) involve genuine mindreading, we argue that this ability is elicited and scaffolded by linguistic communication. We conclude that mindreading has emerged as the outcome of a highly derivative long-term constructivist process of biocultural becoming that led to a relatively recent restructuring of the human mind in multiple worldly locations at different times. In particular, we conjecture that humans gradually converged on establishing linguistic practices allowing the understanding of others' actions in terms of mental reasons. These practices were bequeathed to further generations, and continue nowadays to scaffold the acquisition of mindreading in early childhood.