The efficiency of marker-assisted prediction of phenotypes has been studied intensively for different types of plant breeding populations. However, one remaining question is how to incorporate and counterbalance information from biparental and multiparental populations into model training for genome-wide prediction. To address this question, we evaluated testcross performance of 1652 doubled-haploid maize (Zea mays L.) lines that were genotyped with 56,110 single nucleotide polymorphism markers and phenotyped for five agronomic traits in four to six European environments. The lines are arranged in two diverse half-sib panels representing two major European heterotic germplasm pools. The data set contains 10 related biparental dent families and 11 related biparental flint families generated from crosses of maize lines important for European maize breeding. With this new data set we analyzed genome-based best linear unbiased prediction in different validation schemes and compositions of estimation and test sets. Further, we theoretically and empirically investigated marker linkage phases across multiparental populations. In general, predictive abilities similar to or higher than those within biparental families could be achieved by combining several half-sib families in the estimation set. For the majority of families, 375 half-sib lines in the estimation set were sufficient to reach the same predictive performance of biomass yield as an estimation set of 50 full-sib lines. In contrast, prediction across heterotic pools was not possible for most cases. Our findings are important for experimental design in genome-based prediction as they provide guidelines for the genetic structure and required sample size of data sets used for model training.
IN the context of quantitative trait locus (QTL) mapping, multiparental populations have been suggested to be advantageous over biparental families due to their greater allelic diversity and the possibility of evaluating allelic effects in multiple genetic backgrounds (Muranty 1996;Xu 1998;Verhoeven et al. 2006). Especially if the multiparental population consists of several families connected by common parents, they can provide greater power of QTL detection and better resolution of QTL localization compared to individual families (Rebai and Goffinet 1993;Jannink and Jansen 2001;Blanc et al. 2006;Yu et al. 2008;Bardol et al. 2013;Mackay et al. 2014). In the context of genome-based prediction (Meuwissen et al. 2001), accuracies achieved within large biparental families are assumed to be the maximum that can be obtained with a given sample size (Crossa et al. 2014), because of medium allele frequencies, absence of genetic substructure, and equal linkage phases between markers and functional polymorphisms. However, prediction accuracies of newly generated progenies from different crosses will be poor. This is especially true if the respective germplasm exhibits broad allelic diversity and is unrelated to the biparental family from which single nucleotide polymorphism (...