Public evaluations are popular because some research questions can only be answered by turning "to the wild." Different approaches place experimenters in different roles during deployment, which has implications for the kinds of data that can be collected and the potential bias introduced by the experimenter. This paper expands our understanding of how experimenter roles impact public evaluations and provides an empirical basis to consider different evaluation approaches. We completed an evaluation of a playful gesture-controlled display -not to understand interaction at the display but to compare different evaluation approaches. The conditions placed the experimenter in three roles, steward observer, overt observer, and covert observer, to measure the effect of experimenter presence and analyse the strengths and weaknesses of each approach.