Many studies have demonstrated the utility of machine learning (ML) methods for genomic prediction (GP) of various plant traits, but it is still largely unclear if and when ML should be chosen over conventionally used, often simpler parametric methods,. Predictive performance of GP models might depend on a plethora of factors including sample size, number of markers, population structure and genetic architecture. Here, we investigate which problem and dataset characteristics are related to good performance of ML methods for genomic prediction. We compare the predictive performance of two frequently used ensemble ML methods (Random Forest and Extreme Gradient Boosting) with parametric methods including genomic best linear unbiased prediction (GBLUP), reproducing kernel Hilbert space regression (RKHS), BayesA and BayesB. To explore problem characteristics, we use simulated and real plant traits under different genetic complexity levels determined by the number of Quantitative Trait Loci (QTLs), heritability (ℎ � and ℎ � � ), population structure and linkage disequilibrium between causal nucleotides and other SNPs. Decision tree based ensemble ML methods are a reasonable choice for phenotypes with allelic interactions and are comparable to Bayesian methods for additive phenotypes in the case of large effect Quantitative Trait Nucleotides (QTNs). Furthermore, we find that ML methods are susceptible to confounding due to population structure but less sensitive to low linkage disequilibrium than linear parametric methods. Overall, this provides insights into the usefulness of ML in GP as well as guidelines for practitioners.