Human joint action is generally facilitated by the tendency to represent not only one’s own task and behavior but also the partner’s. Yet, under some conditions, such as in the joint Simon task, corepresentation can cause interference and hampers, rather than facilitates, joint performance. A competent cooperator should thus also be able to flexibly inhibit corepresentation if that is conducive to cooperation success. To investigate the evolutionary origin of corepresentation, as well as the cooperative flexibility to inhibit it when necessary, we tested brown capuchins and Tonkean macaques in the joint Simon task and compared them with the previously tested marmosets. Corepresentation was present in all 3 species, but its strength and the cooperation success varied substantially. The cooperatively breeding marmosets showed the weakest corepresentation effect and, therefore, highest cooperation success, and they were the only ones to use mutual gaze when coordination with the partner was necessary. Cooperative flexibility was therefore not correlated with brain size but with the prevalence of cooperation in nature. This conclusion was corroborated by species differences in gazing patterns and suggests that the drivers of cooperative flexibility in humans were not solely cognitive.