Abstract. Whether neutrality has positive or negative effects on evolutionary search is a contentious topic, with reported experimental results supporting both sides of the debate. Most existing studies use performance statistics, e.g., success rate or search efficiency, to investigate if neutrality, either embedded or artificially added, can benefit an evolutionary algorithm. Here, we argue that understanding the influence of neutrality on evolutionary optimization requires an understanding of the interplay between robustness and evolvability at the genotypic and phenotypic scales. As a concrete example, we consider a simple linear genetic programming system that is amenable to exhaustive enumeration, and allows for the full characterization of these properties. We adopt statistical measurements from RNA systems to quantify robustness and evolvability at both genotypic and phenotypic levels. Using an ensemble of random walks, we demonstrate that the benefit of neutrality crucially depends upon its phenotypic distribution.