The genome-wide association study (GWAS) approach has discovered hundreds of genetic variants associated with diseases and quantitative traits. However, despite clinical overlap and statistical correlation between many phenotypes, GWAS are generally performed one-phenotype-at-a-time. Here we compare the performance of modelling multiple phenotypes jointly with that of the standard univariate approach. We introduce a new method and software, MultiPhen, that models multiple phenotypes simultaneously in a fast and interpretable way. By performing ordinal regression, MultiPhen tests the linear combination of phenotypes most associated with the genotypes at each SNP, and thus potentially captures effects hidden to single phenotype GWAS. We demonstrate via simulation that this approach provides a dramatic increase in power in many scenarios. There is a boost in power for variants that affect multiple phenotypes and for those that affect only one phenotype. While other multivariate methods have similar power gains, we describe several benefits of MultiPhen over these. In particular, we demonstrate that other multivariate methods that assume the genotypes are normally distributed, such as canonical correlation analysis (CCA) and MANOVA, can have highly inflated type-1 error rates when testing case-control or non-normal continuous phenotypes, while MultiPhen produces no such inflation. To test the performance of MultiPhen on real data we applied it to lipid traits in the Northern Finland Birth Cohort 1966 (NFBC1966). In these data MultiPhen discovers 21% more independent SNPs with known associations than the standard univariate GWAS approach, while applying MultiPhen in addition to the standard approach provides 37% increased discovery. The most associated linear combinations of the lipids estimated by MultiPhen at the leading SNPs accurately reflect the Friedewald Formula, suggesting that MultiPhen could be used to refine the definition of existing phenotypes or uncover novel heritable phenotypes.
Summary Post-copulatory sexual selection can select for sperm allocation strategies in males [1, 2] but males should also strategically allocate non-sperm components of the ejaculate [3, 4] such as seminal fluid proteins (Sfps). Sfps can influence the extent of post-copulatory sexual selection [5–7] but little is known of the causes or consequences of quantitative variation in Sfp production and transfer. Using Drosophila melanogaster, we demonstrate that Sfps are strategically allocated to females in response to the potential level of sperm competition. We also show that males who can produce and transfer larger quantities of specific Sfps have a significant competitive advantage. When males were exposed to a competitor male, matings were longer and more of two key Sfps, sex peptide [8] and ovulin [9], were transferred, indicating strategic allocation of Sfps. Males selected for large accessory glands (AGs, a major site of Sfp synthesis) produced and transferred significantly more sex peptide, but not more ovulin. Large AG males also had significantly increased competitive reproductive success. Our results show that quantitative variation in specific Sfps is likely to play an important role in post-copulatory sexual selection and that investment in Sfp production is essential for male fitness in a competitive environment.
High-density linkage maps are important tools for genome biology and evolutionary genetics by quantifying the extent of recombination, linkage disequilibrium, and chromosomal rearrangements across chromosomes, sexes, and populations. They provide one of the best ways to validate and refine de novo genome assemblies, with the power to identify errors in assemblies increasing with marker density. However, assembly of high-density linkage maps is still challenging due to software limitations. We describe Lep-MAP2, a software for ultradense genome-wide linkage map construction. Lep-MAP2 can handle various family structures and can account for achiasmatic meiosis to gain linkage map accuracy. Simulations show that Lep-MAP2 outperforms other available mapping software both in computational efficiency and accuracy. When applied to two large F2-generation recombinant crosses between two nine-spined stickleback (Pungitius pungitius) populations, it produced two high-density (∼6 markers/cM) linkage maps containing 18,691 and 20,054 single nucleotide polymorphisms. The two maps showed a high degree of synteny, but female maps were 1.5–2 times longer than male maps in all linkage groups, suggesting genome-wide recombination suppression in males. Comparison with the genome sequence of the three-spined stickleback (Gasterosteus aculeatus) revealed a high degree of interspecific synteny with a low frequency (<5%) of interchromosomal rearrangements. However, a fairly large (ca. 10 Mb) translocation from autosome to sex chromosome was detected in both maps. These results illustrate the utility and novel features of Lep-MAP2 in assembling high-density linkage maps, and their usefulness in revealing evolutionarily interesting properties of genomes, such as strong genome-wide sex bias in recombination rates.
Dogs are of increasing interest as models for human diseases, and many canine population-association studies are beginning to emerge. The choice of breeds for such studies should be informed by a knowledge of factors such as inbreeding, genetic diversity, and population structure, which are likely to depend on breedspecific selective breeding patterns. To address the lack of such studies we have exploited one of the world's most extensive resources for canine population-genetics studies: the United Kingdom (UK) Kennel Club registration database. We chose 10 representative breeds and analyzed their pedigrees since electronic records were established around 1970, corresponding to about eight generations before present. We find extremely inbred dogs in each breed except the greyhound and estimate an inbreeding effective population size between 40 and 80 for all but 2 breeds. For all but 3 breeds, .90% of unique genetic variants are lost over six generations, indicating a dramatic effect of breeding patterns on genetic diversity. We introduce a novel index C for measuring population structure directly from the pedigree and use it to identify subpopulations in several breeds. As well as informing the design of canine population genetics studies, our results have implications for breeding practices to enhance canine welfare.
The Gasterosteidae fish family hosts several species that are important models for eco-evolutionary, genetic and genomic research. In particular, a wealth of genetic and genomic data has been generated for the three-spined stickleback (Gasterosteus aculeatus), the ‘ecology’s supermodel’, while the genomic resources for the nine-spined stickleback (Pungitius pungitius) have remained relatively scarce. Here, we report a high-quality chromosome-level genome assembly of P. pungitius consisting of 5,303 contigs (N50 = 1.2 Mbp) with a total size of 521 Mbp. These contigs were mapped to 21 linkage groups using a high-density linkage map, yielding a final assembly with 98.5% BUSCO completeness. A total of 25,062 protein-coding genes were annotated, and ca. 23% of the assembly was found to consist of repetitive elements. A comprehensive analysis of repetitive elements uncovered centromeric-specific tandem repeats and provided insights into the evolution of retrotransposons. A multigene phylogenetic analysis inferred a divergence time of about 26 million years (MYA) between nine- and three-spined sticklebacks, which is far older than the commonly assumed estimate of 13 MYA. Compared to the three-spined stickleback, we identified an additional duplication of several genes in the hemoglobin cluster. Sequencing data from populations adapted to different environments indicated potential copy number variations in hemoglobin genes. Furthermore, genome-wide synteny comparisons between three- and nine-spined sticklebacks identified chromosomal rearrangements underlying the karyotypic differences between the two species. The high-quality chromosome-scale assembly of the nine-spined stickleback genome obtained with long-read sequencing technology provides a crucial resource for comparative and population genomic investigations of stickleback fishes and teleosts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.