ContextProgram synthesis is the task of automatically finding a program that satisfies the user intention. In previous work, we developed APS‐GA, a program synthesizer based on a genetic algorithm. As genetic algorithms depend on a fitness function, so does APS‐GA. Researchers argue that different distance metrics for a fitness function may reveal behavioral differences in the genetic algorithm. More recently, we presented initial evidence that APS‐GA was not affected by different distance metrics for its fitness function. However, that study was carried out on a medium‐sized scale.ObjectiveIn order to investigate our previous study on a larger scale, we extended our experiment to replicate it on a search space that is up to 6500 times larger than our previous work, and we ran it with a synthesis time that was at least 20 times longer. We have chosen the same five distance metrics as fitness functions to check whether they affect the synthesis task of five integer domain imperative toy programs.MethodA hypothesis test was proposed and experiments were conducted to observe the number of calls to the fitness function () and to measure the synthesis time ().ResultsBy considering a confidence level of 95% (with ), we found out that there were no significant differences in both and .ConclusionWith these results, our extended replication study suggests that the discrete distance metric constitutes the best choice for APS‐GA as it guides the search with the same effectiveness as the other metrics, and is cheaper to compute.