Before populations become independent evolutionary lineages, the effects of micro evolutionary processes tend to generate complex scenarios of diversification that may affect phylogenetic reconstruction. Not accounting for gene flow in species tree estimates can directly impact topology, effective population sizes and branch lengths, and the resulting estimation errors are still poorly understood in wild populations. In this study, we used an integrative approach, including sequence capture of ultra-conserved elements (UCEs), mtDNA Sanger sequencing and morphological data to investigate species limits and phylogenetic relationships in face of gene flow in an Amazonian endemic species (Myrmoborus lugubris: Aves). We used commonly implemented species tree and model-based approaches to understand the potential effects of gene flow in phylogenetic reconstructions. The genetic structure observed was congruent with the four recognized subspecies of M. lugubris. Morphological and UCEs data supported the presence of a wide hybrid zone between M. l. femininus from the Madeira river and M. l. lugubris from the Middle and lower Amazon river, which were recovered as sister taxa by species tree methods. When fitting gene flow into simulated demographic models with different topologies, the best-fit model indicated these two taxa as non-sister lineages, a finding that is in agreement with the results of mitochondrial and morphological analyses. Our results demonstrated that failing to account for gene flow when estimating phylogenies at shallow divergence levels can generate topological uncertainty, which can nevertheless be statistically well supported, and that model testing approaches using simulated data can be useful tools to test alternative phylogenetic hypotheses.
Abstract.-Advances in high-throughput sequencing techniques now allow relatively easy 20 and affordable sequencing of large portions of the genome, even for non-model organisms.
21Many phylogenetic studies prefer to reduce costs by focusing their sequencing efforts on a 22 selected set of targeted loci, commonly enriched using sequence capture. The advantage of 23 this approach is that it recovers a consistent set of loci, each with high sequencing depth, 24 which leads to more confidence in the assembly of target sequences. High sequencing depth 25 can also be used to identify phylogenetically informative allelic variation within sequenced 26 individuals, but allele sequences are infrequently assembled in phylogenetic studies.
27Instead, many scientists perform their phylogenetic analyses using contig sequences which 28 result from the de novo assembly of sequencing reads into contigs containing only canonical 29 nucleobases, and this may reduce both statistical power and phylogenetic accuracy. Here,
30we develop an easy-to-use pipeline to recover allele sequences from sequence capture data,
31and we use simulated and empirical data to demonstrate the utility of integrating these 32 allele sequences to analyses performed under the Multispecies Coalescent (MSC) model.
33Our empirical analyses of Ultraconserved Element (UCE) locus data collected from the 34 South American hummingbird genus Topaza demonstrate that phased allele sequences carry 35 sufficient phylogenetic information to infer the genetic structure, lineage divergence, and 36 biogeographic history of a genus that diversified during the last three million years, support 37 the recognition of two species, and suggest a high rate of gene flow across large distances of 38 rainforest habitats but rare admixture across the Amazon River. Our simulations show 39 that analyzing allele sequences leads to more accurate estimates of tree topology and 40 divergence times than the more common approach of using contig sequences. We conclude 41 that allele phasing may be the most appropriate processing scheme for phylogenetic 42 analyses of UCE data in particular, and sequence capture data, more generally. (Fig. 4). Hereafter, we use "contigs" and "contig 61 sequences" to refer to the sequences that are output by de novo assemblers.
62One alternative approach to generating contig sequences uses the depth of 29, 2018; estimation of gene trees, species trees, and divergence times (Garrick et al. 2010; Potts 72 et al. 2014; Lischer et al. 2014). The common practice of neglecting allelic information in 73 phylogenetic studies possibly results from historical inertia and a lack of computational 74 pipelines to prepare allele sequences for phylogenetic analysis using MPS data.
CC-BY-ND4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/255752 doi: bioRxiv preprint first posted online Jan.
75In addition to the problems of determining allelic se...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.