Hybridization plays an important role in the evolution of certain groups of organisms, adaptation to their environments, and diversification of their genomes. The evolutionary histories of such groups are reticulate, and methods for reconstructing them are still in their infancy and have limited applicability. We present a maximum likelihood method for inferring reticulate evolutionary histories while accounting simultaneously for incomplete lineage sorting. Additionally, we propose methods for assessing confidence in the amount of reticulation and the topology of the inferred evolutionary history. Our method obtains accurate estimates of reticulate evolutionary histories on simulated datasets. Furthermore, our method provides support for a hypothesis of a reticulate evolutionary history inferred from a set of house mouse (Mus musculus) genomes. As evidence of hybridization in eukaryotic groups accumulates, it is essential to have methods that infer reticulate evolutionary histories. The work we present here allows for such inference and provides a significant step toward putting phylogenetic networks on par with phylogenetic trees as a model of capturing evolutionary relationships. reticulate evolution | incomplete lineage sorting | phylogenetic networks | maximum likelihood P hylogenetic trees have long been a mainstay of biology, providing an interpretive model of the evolution of molecules and characters and a backdrop against which comparative genomics and phenomics are conducted. Nevertheless, some evolutionary events, most notably horizontal gene transfer in prokaryotes and hybridization in eukaryotes, necessitate going beyond trees (1). These events result in reticulate evolutionary histories, which are best modeled by phylogenetic networks (2). The topology of a phylogenetic network is given by a rooted, directed, acyclic graph (rDAG) that is leaf-labeled by a set of taxa ( Fig. 1; more details are provided in Model and SI Appendix). Reticulation events result in genomic regions with local genealogies that are incongruent with the speciation pattern. Several methods and heuristics use this incongruence as a signal for inferring reticulation events and reconstructing phylogenetic networks from local genealogies. These methods, which are surveyed elsewhere (2-4), assume that reticulation events are the sole cause of all incongruence among the gene trees and seek phylogenetic networks to explain all of the incongruence. A serious limitation of these methods is that they would grossly overestimate the amount of reticulation in a dataset when other causes of incongruence are at play. Indeed, several recent studies (5-9) have shown that detecting hybridization in practice can be complicated by the presence of incomplete lineage sorting (ILS) (Fig. 1).Recently, a set of methods was devised to analyze data where reticulation and ILS might both be simultaneously at play (10-15). However, these methods are all applicable to simple scenarios of species evolution and mostly assume a known hypothesis about the topol...
We report on a genome-wide scan for introgression between the house mouse (Mus musculus domesticus) and the Algerian mouse (Mus spretus), using samples from the ranges of sympatry and allopatry in Africa and Europe. Our analysis reveals wide variability in introgression signatures along the genomes, as well as across the samples. We find that fewer than half of the autosomes in each genome harbor all detectable introgression, whereas the X chromosome has none. Further, European mice carry more M. spretus alleles than the sympatric African ones. Using the length distribution and sharing patterns of introgressed genomic tracts across the samples, we infer, first, that at least three distinct hybridization events involving M. spretus have occurred, one of which is ancient, and the other two are recent (one presumably due to warfarin rodenticide selection). Second, several of the inferred introgressed tracts contain genes that are likely to confer adaptive advantage. Third, introgressed tracts might contain driver genes that determine the evolutionary fate of those tracts. Further, functional analysis revealed introgressed genes that are essential to fitness, including the Vkorc1 gene, which is implicated in rodenticide resistance, and olfactory receptor genes. Our findings highlight the extent and role of introgression in nature and call for careful analysis and interpretation of house mouse data in evolutionary and genetic studies.Mus musculus | Mus spretus | hybridization | adaptive introgression | PhyloNet-HMM C lassical laboratory mouse strains, as well as newly established wild-derived ones, are widely used by geneticists for answering a diverse array of questions (1). Understanding the genome contents and architecture of these strains is important for studies of natural variation and complex traits, as well as evolutionary studies in general (2). Mus spretus, a sister species of Mus musculus, impacts the findings in M. musculus investigations for at least two reasons. First, it was deliberately interbred with laboratory M. musculus strains to introduce genetic variation (3). Second, Mus musculus domesticus is partially sympatric (naturally cooccurring) with M. spretus (Fig. 1).Recent studies have examined admixture between subspecies of house mice (5-8), but have not studied introgression with M. spretus. In at least one case (5), the introgressive descent of the mouse genome was hidden due to data postprocessing that masked introgressed genomic regions as missing data. In another study reporting whole-genome sequencing of 17 classical laboratory strains (6), M. spretus was used as an outgroup for phylogenetic analysis. The authors were surprised to find that 12.1% of loci failed to place M. spretus as an outgroup to the M. musculus clade. The authors concluded that M. spretus was not a reliable outgroup but did not pursue their observation further. On the other hand, in a 2002 study (9), Orth et al. compiled data on allozyme, microsatellite, and mitochondrial variation in house mice from Spain (sympatry) and...
Hematogenous metastasis is initiated by a subset of circulating tumor cells (CTC) shed from primary or metastatic tumors into the blood circulation. Thus, CTCs provide a unique patient biopsy resource to decipher the cellular subpopulations that initiate metastasis and their molecular properties. However, one crucial question is whether CTCs derived and expanded ex vivo from patients recapitulate human metastatic disease in an animal model. Here, we show that CTC lines established from patients with breast cancer are capable of generating metastases in mice with a pattern recapitulating most major organs from corresponding patients. Genome-wide sequencing analyses of metastatic variants identifi ed semaphorin 4D as a regulator of tumor cell transmigration through the blood-brain barrier and MYC as a crucial regulator for the adaptation of disseminated tumor cells to the activated brain microenvironment. These data provide the direct experimental evidence of the promising role of CTCs as a prognostic factor for site-specifi c metastasis. SIGNIFICANCE:Interests abound in gaining new knowledge of the physiopathology of brain metastasis. In a direct metastatic tropism analysis, we demonstrated that ex vivo -cultured CTCs from 4 patients with breast cancer showed organotropism, revealing molecular features that allow a subset of CTCs to enter and grow in the brain.
One outcome of interspecific hybridization and subsequent effects of evolutionary forces is introgression, which is the integration of genetic material from one species into the genome of an individual in another species. The evolution of several groups of eukaryotic species has involved hybridization, and cases of adaptation through introgression have been already established. In this work, we report on PhyloNet-HMM—a new comparative genomic framework for detecting introgression in genomes. PhyloNet-HMM combines phylogenetic networks with hidden Markov models (HMMs) to simultaneously capture the (potentially reticulate) evolutionary history of the genomes and dependencies within genomes. A novel aspect of our work is that it also accounts for incomplete lineage sorting and dependence across loci. Application of our model to variation data from chromosome 7 in the mouse (Mus musculus domesticus) genome detected a recently reported adaptive introgression event involving the rodent poison resistance gene Vkorc1, in addition to other newly detected introgressed genomic regions. Based on our analysis, it is estimated that about 9% of all sites within chromosome 7 are of introgressive origin (these cover about 13 Mbp of chromosome 7, and over 300 genes). Further, our model detected no introgression in a negative control data set. We also found that our model accurately detected introgression and other evolutionary processes from synthetic data sets simulated under the coalescent model with recombination, isolation, and migration. Our work provides a powerful framework for systematic analysis of introgression while simultaneously accounting for dependence across sites, point mutations, recombination, and ancestral polymorphism.
BackgroundBranching events in phylogenetic trees reflect bifurcating and/or multifurcating speciation and splitting events. In the presence of gene flow, a phylogeny cannot be described by a tree but is instead a directed acyclic graph known as a phylogenetic network. Both phylogenetic trees and networks are typically reconstructed using computational analysis of multi-locus sequence data. The advent of high-throughput sequencing technologies has brought about two main scalability challenges: (1) dataset size in terms of the number of taxa and (2) the evolutionary divergence of the taxa in a study. The impact of both dimensions of scale on phylogenetic tree inference has been well characterized by recent studies; in contrast, the scalability limits of phylogenetic network inference methods are largely unknown.ResultsIn this study, we quantify the performance of state-of-the-art phylogenetic network inference methods on large-scale datasets using empirical data sampled from natural mouse populations and a range of simulations using model phylogenies with a single reticulation. We find that, as in the case of phylogenetic tree inference, the performance of leading network inference methods is negatively impacted by both dimensions of dataset scale. In general, we found that topological accuracy degrades as the number of taxa increases; a similar effect was observed with increased sequence mutation rate. The most accurate methods were probabilistic inference methods which maximize either likelihood under coalescent-based models or pseudo-likelihood approximations to the model likelihood. The improved accuracy obtained with probabilistic inference methods comes at a computational cost in terms of runtime and main memory usage, which become prohibitive as dataset size grows past twenty-five taxa. None of the probabilistic methods completed analyses of datasets with 30 taxa or more after many weeks of CPU runtime.ConclusionsWe conclude that the state of the art of phylogenetic network inference lags well behind the scope of current phylogenomic studies. New algorithmic development is critically needed to address this methodological gap.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1277-1) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.