Senior Corresponding Authors: Matthew E. Hurles, The Wellcome Trust Sanger Institute,
Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs.
Extensive studies are currently being performed to associate disease susceptibility with one form of genetic variation, namely single nucleotide polymorphisms (SNPs). In recent years another type of common genetic variation has been characterised, namely structural variation, including copy number variations (CNVs). To determine the overall contribution of CNVs to complex phenotypes we have performed association analyses of expression levels of 14,925 transcripts with SNPs and CNVs in individuals who are part of the International HapMap project. SNPs and CNVs captured 83.6% and 17.7% of the total detected genetic variation in gene expression, respectively, but the signals from the two types of variation had little overlap. Interrogation of the genome for both types of variants may be an effective way to elucidate the causes of complex phenotypes and disease in humans.Understanding the genetic basis of phenotypic variation in human populations is currently one of the major goals in human genetics. Gene expression (the transcription of DNA into messenger RNA) has been interrogated in a variety of species and experimental scenarios to investigate the genetic basis of variation in gene regulation (1)(2)(3)(4)(5)(6)(7)(8), and to tease apart regulatory networks (9, 10). In some respects, a comprehensive survey of gene expression * Correspondence should be addressed to: Emmanouil T. Dermitzakis (md4@sanger.ac.uk; +44-1223-494866) or Matthew E. Hurles (meh@sanger.ac.uk; +44-1223-495377) (26) and www.sanger.ac.uk/humgen/cnv/data). Log 2 ratios from two sets of clones were analyzed: the whole set of 24,963 autosomal clones (CGH-clones) and the 1322 autosomal clones corresponding to CNVs present in at least two HapMap individuals (CNV clones) (26). We excluded genes on sex chromosomes due to their imbalance in males and females. We performed linear regression (on each of the 4 populations separately) between normalized quantitative gene expression values and SNP genotypes or clone log 2 ratios that were near the gene (SNP position or clone midpoint within 1 Mb and 2Mb, respectively, of the probe midpoint position). We used different window sizes for SNPs and clones because clones are large (median size of ∼170 Kb) and structural variants can exert long-range effects (21), so a 2 Mb window is more appropriate. Statistical significance was evaluated through the use of permutations (27), as previously described (1), and a corrected p-value threshold of 0.001 applied (see Methods). Repeated permutation exercises showed that our permutation thresholds were very stable (see Supplementary Table 4). We test a large number of genes so an additional correction is required. This can either be done by adjusting the threshold to a new corrected threshold above which all genes are expected to be significant (e.g. Bonferoni correction) or by setting the threshold to a value that generates a satisfactory false discovery rate (FDR). We have used the second and we have estimated the FDR based on the number of genes tested and E...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.