Genome-wide association studies have identified hundreds of genetic variants associated with complex human diseases and traits, and have provided valuable insights into their genetic architecture. Most variants identified so far confer relatively small increments in risk, and explain only a small proportion of familial clustering, leading many to question how the remaining, 'missing' heritability can be explained. Here we examine potential sources of missing heritability and propose research strategies, including and extending beyond current genome-wide association approaches, to illuminate the genetics of complex diseases and enhance its potential to enable effective disease prevention or treatment.Many common human diseases and traits are known to cluster in families and are believed to be influenced by several genetic and environmental factors, but until recently the identification of genetic variants contributing to these 'complex diseases' has been slow and arduous 1 . Genome-wide association studies (GWAS), in which several hundred thousand to more than a million single nucleotide polymorphisms (SNPs) are assayed in thousands of individuals, represent a powerful new tool for investigating the genetic architecture of complex diseases 1, 2. In the past few years, these studies have identified hundreds of genetic variants associated with such conditions and have provided valuable insights into the complexities of their genetic architecture3 , 4.The genome-wide association (GWA) method represents an important advance compared to 'candidate gene' studies, in which sample sizes are generally smaller and the variants assayed are limited to a selected few, often on the basis of imperfect understanding of biological pathways and often yielding associations that are difficult to replicate 5,6. GWAS are also an important step beyond family-based linkage studies, in which inheritance patterns are related to several hundreds to thousands of genomic markers. Despite many clear successes in singlegene 'Mendelian' disorders7 , 8, the limited success of linkage studies in complex diseases has been attributed to their low power and resolution for variants of modest effect 9-11 .The underlying rationale for GWAS is the 'common disease, common variant' hypothesis, positing that common diseases are attributable in part to allelic variants present in more than 1-5% of the population12 -14. They have been facilitated by the development of commercial 'SNP chips' or arrays that capture most, although not all, common variation in the genome. Although the allelic architecture of some conditions, notably age-related macular degeneration, for the most part reflects the contributions of several variants of large effect (defined loosely here as those increasing disease risk by twofold or more), most common variants individually or in combination confer relatively small increments in risk (1.1-1.5-fold) and explain only a small proportion of heritability-the portion of phenotypic variance in a population attributable to additive ...
Summary Host genetics and the gut microbiome can both influence metabolic phenotypes. However, whether host genetic variation shapes the gut microbiome and interacts with it to affect host phenotype is unclear. Here, we compared microbiotas across > 1,000 fecal samples obtained from the TwinsUK population, including 416 twin-pairs. We identified many microbial taxa whose abundances were influenced by host genetics. The most heritable taxon, the family Christensenellaceae, formed a cooccurrence network with other heritable Bacteria and with methanogenic Archaea. Furthermore, Christensenellaceae and its partners were enriched in individuals with low body mass index (BMI). An obese-associated microbiome was amended with Christensenella minuta, a cultured member of the Christensenellaceae, and transplanted to germfree mice. C. minuta amendment reduced weight gain and altered the microbiome of recipient mice. Our findings indicate that host genetics influence the composition of the human gut microbiome and can do so in ways that impact host metabolism.
The accelerating pace of genome sequencing throughout the tree of life is driving the need for improved unsupervised annotation of genome components such as transposable elements (TEs). Because the types and sequences of TEs are highly variable across species, automated TE discovery and annotation are challenging and time-consuming tasks. A critical first step is the de novo identification and accurate compilation of sequence models representing all of the unique TE families dispersed in the genome. Here we introduce RepeatModeler2, a pipeline that greatly facilitates this process. This program brings substantial improvements over the original version of RepeatModeler, one of the most widely used tools for TE discovery. In particular, this version incorporates a module for structural discovery of complete long terminal repeat (LTR) retroelements, which are widespread in eukaryotic genomes but recalcitrant to automated identification because of their size and sequence complexity. We benchmarked RepeatModeler2 on three model species with diverse TE landscapes and high-quality, manually curated TE libraries: Drosophila melanogaster (fruit fly), Danio rerio (zebrafish), and Oryza sativa (rice). In these three species, RepeatModeler2 identified approximately 3 times more consensus sequences matching with >95% sequence identity and sequence coverage to the manually curated sequences than the original RepeatModeler. As expected, the greatest improvement is for LTR retroelements. Thus, RepeatModeler2 represents a valuable addition to the genome annotation toolkit that will enhance the identification and study of TEs in eukaryotic genome sequences. RepeatModeler2 is available as source code or a containerized package under an open license (https://github.com/Dfam-consortium/RepeatModeler, http://www.repeatmasker.org/RepeatModeler/).
Detecting selective sweeps from genomic SNP data is complicated by the intricate ascertainment schemes used to discover SNPs, and by the confounding influence of the underlying complex demographics and varying mutation and recombination rates. Current methods for detecting selective sweeps have little or no robustness to the demographic assumptions and varying recombination rates, and provide no method for correcting for ascertainment biases. Here, we present several new tests aimed at detecting selective sweeps from genomic SNP data. Using extensive simulations, we show that a new parametric test, based on composite likelihood, has a high power to detect selective sweeps and is surprisingly robust to assumptions regarding recombination rates and demography (i.e., has low Type I error). Our new test also provides estimates of the location of the selective sweep(s) and the magnitude of the selection coefficient. To illustrate the method, we apply our approach to data from the Seattle SNP project and to Chromosome 2 data from the HapMap project. In Chromosome 2, the most extreme signal is found in the lactase gene, which previously has been shown to be undergoing positive selection. Evidence for selective sweeps is also found in many other regions, including genes known to be associated with disease risk such as DPP10 and COL4A3.
Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.