Over the last decades, participatory approaches involving on-farm experimentation have become more prevalent in agricultural research. Nevertheless, these approaches remain difficult to scale because they usually require close attention from well-trained professionals. Novel large-N participatory trials, building on recent advances in citizen science and crowdsourcing methodologies, involve large numbers of participants and little researcher supervision. Reduced supervision may affect data quality, but the "Wisdom of Crowds" principle implies that many independent observations from a diverse group of people often lead to highly accurate results when taken together. In this study, we test whether farmergenerated data in agricultural citizen science are good enough to generate valid statements about the research topic. We experimentally assess the accuracy of farmer observations in trials of crowdsourced crop variety selection that use triadic comparisons of technologies (tricot). At five sites in Honduras, 35 farmers (women and men) participated in tricot experiments. They ranked three varieties of common bean (Phaseolus vulgaris L.) for Plant vigor, Plant architecture, Pest resistance, and Disease resistance. Furthermore, with a simulation approach using the empirical data, we did an orderof-magnitude estimation of the sample size of participants needed to produce relevant results. Reliability of farmers' experimental observations was generally low (Kendall's W 0.174 to 0.676). But aggregated observations contained information and had sufficient validity (Kendall's tau coefficient 0.33 to 0.76) to identify the correct ranking orders of varieties by fitting Mallows-Bradley-Terry models to the data. Our sample size simulation shows that low reliability can be compensated by engaging higher numbers of observers to generate statistically meaningful results, demonstrating the usefulness of the Wisdom of Crowds principle in agricultural research. In this first study on data quality from a farmer citizen science methodology, we show that realistic numbers of less than 200 participants can produce meaningful results for agricultural research by tricot-style trials.