As population structure can result in spurious associations, it has constrained the use of association studies in human and plant genetics. Association mapping, however, holds great promise if true signals of functional association can be separated from the vast number of false signals generated by population structure. We have developed a unified mixed-model approach to account for multiple levels of relatedness simultaneously as detected by random genetic markers. We applied this new approach to two samples: a family-based sample of 14 human families, for quantitative gene expression dissection, and a sample of 277 diverse maize inbred lines with complex familial relationships and population structure, for quantitative trait dissection. Our method demonstrates improved control of both type I and type II error rates over other methods. As this new method crosses the boundary between family-based and structured association samples, it provides a powerful complement to currently available methods for association mapping.
Historically, association tests have been used extensively in medical genetics, but have had virtually no application in plant genetics. One obstacle to their application is the structured populations often found in crop plants, which may lead to nonfunctional, spurious associations. In this study, statistical methods to account for population structure were extended for use with quantitative variation and applied to our evaluation of maize flowering time. Mutagenesis and quantitative trait locus (QTL) studies suggested that the maize gene Dwarf8 might affect the quantitative variation of maize flowering time and plant height. The wheat orthologs of this gene contributed to the increased yields seen in the 'Green Revolution' varieties. We used association approaches to evaluate Dwarf8 sequence polymorphisms from 92 maize inbred lines. Population structure was estimated using a Bayesian analysis of 141 simple sequence repeat (SSR) loci. Our results indicate that a suite of polymorphisms associate with differences in flowering time, which include a deletion that may alter a key domain in the coding region. The distribution of nonsynonymous polymorphisms suggests that Dwarf8 has been a target of selection.
We review and extend a recent suggestion that fine-scale localization of a disease-susceptibility locus for a complex disease be done on the basis of deviations from Hardy-Weinberg equilibrium among affected individuals. This deviation is driven by linkage disequilibrium between disease and marker loci in the whole population and requires a heterogeneous genetic basis for the disease. A finding of marker-locus Hardy-Weinberg disequilibrium therefore implies disease heterogeneity and marker-disease linkage disequilibrium. Although a lack of departure of Hardy-Weinberg disequilibrium at marker loci implies that disease susceptibilityweighted linkage disequilibria are zero, given disease heterogeneity, it does not follow that the usual measures of linkage disequilibrium are zero. For disease-susceptibility loci with more than two alleles, therefore, care is needed in the drawing of inferences from marker Hardy-Weinberg disequilibria.
Estimates of genetic population structure (F ST ) were constructed from all autosomes in two large SNP data sets. The Perlegen data set contains genotypes on ∼1 million SNPs segregating in all three samples of Americans of African, Asian, and European descent; and the Phase I HapMap data set contains genotypes on ∼0.6 million SNPs segregating in all four samples from specific Caucasian, Chinese, Japanese, and Yoruba populations. Substantial heterogeneity of F ST values was found between segments within chromosomes, although there was similarity between the two data sets. There was also substantial heterogeneity among population-specific F ST values, with the relative sizes of these values often changing along each chromosome. Population-structure estimates are often used as indicators of natural selection, but the analyses presented here show that individual-marker estimates are too variable to be useful. There is inherent variation in these statistics because of variation in genealogy even among neutral loci, and values at pairs of loci are correlated to an extent that reflects the linkage disequilibrium between them. Furthermore, it may be that the best indications of selection will come from population-specific F ST values rather than the usually reported population-average values.Publication of the Perlegen SNP data set (Hinds et al. 2005) and completion of Phase I of the International HapMap Project (The International HapMap Consortium 2005) have allowed a new perspective on the genetic structure of human populations. These two whole-genome data sets allow population genetic analyses at an unprecedented scale: Previous estimates of genetic population structure (for review, see Garte 2003) have been based on a limited number of loci and provided only average figures of quantities such as F ST (Wright 1951) across the whole genome. The precision of previous estimates is not high, and they relate only to specific genes rather than to the region in which the markers are located. We can expect there to be some diversity in the magnitude of population structure between regions of the genome because the precise genealogy is not the same for each chromosome or part thereof, with values becoming increasingly similar the more closely linked are the regions. The genealogy can differ both by random events and by non-random events such as selection. Strong selection at a locus will induce hitchhiking of nearby regions (Maynard Smith and Haigh 1974), leading to both a reduction in heterozygosity within populations and an increase in diversity between populations as measured by F ST . Examination of the differences in diversity between regions therefore provides an opportunity to identify those that cannot be explained solely in terms of random sampling of the genealogy due to Mendelian segregation, variation in family size, migration, and recombination between genetic sites.Methods for estimating F ST from samples of a group of populations are well established (e.g., Weir and Cockerham 1984).More recently they have been ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.