Visual perspective taking (VPT) is a fundamental process of social cognition. To date, however, only a handful of studies have investigated whether humans also take the perspective of humanoid robots. Recent findings on this topic are conflicting as one study found no evidence for level 1 VPT (i.e., which object is seen by the agent) and a different study has found evidence for level 2 VPT (i.e., how is the object seen by the agent). The latter study proposed that the human-like appearance of robots triggers VPT and that a mental capacity to perceive the environment is not required (mere-appearance hypothesis). In the present study, we tested whether the mere-appearance hypothesis is also applicable to level 1 VPT. We manipulated the appearance of a humanoid robot by either showing it with a human-like or artificial head, and its mental capacity for perception by presenting it as switched on or off. We found that all manipulations triggered VPT, showing, in contrast to earlier findings, level 1 VPT for robots. Our findings support the mere-appearance hypothesis as VPT was triggered regardless of whether the robot was switched on or off, and also show that the mere-appearance hypothesis is robust with regard to alterations of human-like appearance.