2021
DOI: 10.1002/aps3.11441
|View full text |Cite
|
Sign up to set email alerts
|

HybPhaser: A workflow for the detection and phasing of hybrids in target capture data sets

Abstract: Reticulation events caused by hybridization are common and important sources of novelty in angiosperm evolution (Wood et al., 2009;Palfalvi et al., 2020). The detection, investigation, and representation of hybridization remains a challenge in phylogenomics (Kellogg, 2016;Mallet et al., 2016;Spooner et al., 2020). The combination of divergent genomes in hybrids (herein used for any organism that contains divergent genomes due to a hybridization event, e.g., many polyploids) introduces conflicting phylogenetic … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
46
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 47 publications
(46 citation statements)
references
References 52 publications
0
46
0
Order By: Relevance
“…Whereas HybPiper by default constructs the most likely allele for a —supposedly single-copy— gene based on the relative nucleotide frequency of each heterozygous site, HybPhaser instead takes SNP variation into account using nucleotide ambiguity codes, and uses this to quantify divergence between gene variants to detect paralogy and hybridisation. Single genes with high SNP count are likely paralogs, while samples with high SNP count across all genes are likely hybrids or polyploids (Nauheimer et al 2021). Putative paralogs, genes with high SNP count compared to other genes, can be removed from the dataset.…”
Section: Methodsmentioning
confidence: 99%
“…Whereas HybPiper by default constructs the most likely allele for a —supposedly single-copy— gene based on the relative nucleotide frequency of each heterozygous site, HybPhaser instead takes SNP variation into account using nucleotide ambiguity codes, and uses this to quantify divergence between gene variants to detect paralogy and hybridisation. Single genes with high SNP count are likely paralogs, while samples with high SNP count across all genes are likely hybrids or polyploids (Nauheimer et al 2021). Putative paralogs, genes with high SNP count compared to other genes, can be removed from the dataset.…”
Section: Methodsmentioning
confidence: 99%
“…What complicates this interpretation is that gene heterozygosity values in both species of >90% are unexpectedly high and in line with what would be expected of hybrids (Nauheimer et al 2021). This raises the possibility that the entire genus may have a polyploidisation event in its recent ancestry, even before the duplication in Pogonlepis muelleriana.…”
Section: Reproductive Systems and Gene Concordancementioning
confidence: 99%
“…Gene heterozygosity, i.e. the percentage of loci that were heterozygous in a sample, was inferred using the first two steps of the HybPhaser pipeline (Nauheimer et al 2021). To compare heterozygosity between the two species with their different breeding systems, I calculated mean and median gene heterozygosity across all samples of a species.…”
Section: Laboratory Proceduresmentioning
confidence: 99%
“…Phasing copies across loci is related to the problem of haplotype assembly—the phasing of sequencing reads within a locus: in both cases, the goal is to avoid chimeric data that are a mix of multiple evolutionary histories. In the assembly problem, however, a researcher can rely on physical linkage to determine which reads belong to which haplotype (Kates et al, 2018; Schrinner et al, 2020; Majidian et al, 2020; Nauheimer et al, 2021; Tiley et al, 2021). This approach is not available in the locus-phasing case, where the loci are separated from each other by unsequenced regions; the only information available to determine whether two gene copies come from the same subgenome is in the phylogenetic history itself.…”
Section: Introductionmentioning
confidence: 99%
“…Beyond this by-eye approach, and approaches that rely upon existing reference sequences, such as that of Hénocq et al (2020) and Nauheimer et al (2021), there are, to our knowledge, three currently available methods for phasing copies across loci. First, Bertrand et al (2015) developed an approach to phase two loci by finding the largest set of sequence pairs such that any incongruence between the two resulting gene trees could be due to stochastic or coalescent error.…”
Section: Introductionmentioning
confidence: 99%