In this study, we used the phenotype simulation package naturalgwas to test the performance of Zhao's Random Forest method in comparison to an uncorrected Random Forest test, latent factor mixed models (LFMM), genome‐wide efficient mixed models (GEMMA), and confounder adjusted linear regression (CATE). We created 400 sets of phenotypes, corresponding to five effect sizes and two, five, 15, or 30 causal loci, simulated from two empirical data sets containing SNPs from Striped Bass representing three and 13 populations. All association methods were evaluated for their ability to detect genotype–phenotype associations based on power, false discovery rates, and number of false positives. Genomic inflation was highest for uncorrected Random Forest and LFMM tests and lowest for Gemma and Zhao's Random Forest. All association tests had similar power to detect causal loci, and Zhao's Random Forest had the lowest false discovery rate in all scenarios. To measure the performance of association tests in small data sets with few loci surrounding a causal gene we also ran analyses again after removing causal loci from each data set. All association tests were only able to find true positives, defined as loci located within 30 kbp of a causal locus, in 3%–18% of simulations. In contrast, at least one false positive was found in 17%–44% of simulations. Zhao's Random Forest again identified the fewest false positives of all association tests studied. The ability to test the power of association tests for individual empirical data sets can be an extremely useful first step when designing a GWAS study.