A number of clustering algorithms are available to depict population genetic structure (PGS) with genomic data; however, there is no consensus on which methods are the best performing ones. We conducted a simulation study of three PGS scenarios with subpopulations k=2, 5 and 10, recreating several maize genomes as a model to: (i) compare three well-known clustering methods: UPGMA, k-means and, Bayesian method (BM), (ii) asses four internal validation indices: CH, Connectivity, Dunn and Silhouette, to determine the reliable number of groups de ning a PGS, and (iii) estimate the misclassi cation rate for each validation index. Moreover, a publicly available maize dataset was used to illustrate the outcomes of our simulation. BM was the best method to classify individuals in all tested scenarios, without assignment errors. Conversely, UPGMA was the method with the highest misclassi cation rate. In scenarios with 5 and 10 subpopulations, CH and Connectivity indices had the maximum underestimation of group number for all cluster algorithms. Dunn and Silhouette indices showed the best performance with BM. Nevertheless, since Silhouette measures the degree of con dence in cluster assignment, and BM measures the probability of cluster membership, these results should be considered with caution. In this study we found that BM showed to be e cient to depict the PGS in both simulated and real maize datasets. This study offers a robust alternative to unveil the existing PGS, thereby facilitating population studies and breeding strategies in maize programs. Moreover, the present ndings may have implications for other crop species.
The genomic diversity, expressed in the differences between molecular haplotypes of a group of individuals, can be divided into components of variability between and within some factor of classification of the individuals. For such variance partitioning, molecular analysis of variance (AMOVA) is used, which is constructed from the multivariate distances between pairs of haplotypes. The classical AMOVA allows the evaluation of the statistical significance of two or more hierarchical factors and consequently there is no interaction test between factors. However, there are situations where the factors that classify individuals are crossed rather than nested, that is, all the levels of a factor are represented in each level of the other one. This paper proposes a statistical test to evaluate the interaction between crossed factors in a Non-Hierarchical AMOVA. The null hypothesis of interaction establishes that the molecular differences between individuals of different levels of a factor are the same for all the levels of the other factor that classifies them. The proposed analysis of interaction in a Non-Hierarchical AMOVA includes: calculation of the distance matrix and partition of it into blocks, subsequent calculation of residuals and analysis of non-parametric variance on the residuals. Its implementation is illustrated in simulated and real scenarios. The results suggest that the proposed interaction test for the Non-Hierarchical AMOVA presents high power. Key words: genetic variability, non-parametric methods, distances matrix, AMOVA.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.