2020
DOI: 10.1093/sysbio/syaa081
|View full text |Cite
|
Sign up to set email alerts
|

A Cautionary Note on the Use of Genotype Callers in Phylogenomics

Abstract: Next-generation-sequencing genotype callers are commonly used in studies to call variants from newly-sequenced species. However, due to the current availability of genomic resources, it is still common practice to use only one reference genome for a given genus, or even one reference for an entire clade of a higher taxon. The problem with traditional genotype callers, such as the one from GATK, is that they are optimized for variant calling at the population level. However, when these callers are used at the p… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(8 citation statements)
references
References 53 publications
0
8
0
Order By: Relevance
“…In contrast to the demographic history results, the bias that reference-genome selection plays on genetic diversity estimates is more noticeable, regardless of software choice. Although we did not perform an exhaustive test of all software available, meaning our results may be somewhat software-specific, previous work has also shown increasingly erroneous heterozygosity rates when using more divergent references (using the commonly implemented tool gatk; Duchen & Salamin, 2020;McKenna et al, 2010), suggesting that this may be a common issue across software. Reference bias is known to cause heterozygous sites to be incorrectly called as homozygous for the reference allele (Brandt et al, 2015; Ros-Freixedes , 2018).…”
Section: Discussionmentioning
confidence: 94%
“…In contrast to the demographic history results, the bias that reference-genome selection plays on genetic diversity estimates is more noticeable, regardless of software choice. Although we did not perform an exhaustive test of all software available, meaning our results may be somewhat software-specific, previous work has also shown increasingly erroneous heterozygosity rates when using more divergent references (using the commonly implemented tool gatk; Duchen & Salamin, 2020;McKenna et al, 2010), suggesting that this may be a common issue across software. Reference bias is known to cause heterozygous sites to be incorrectly called as homozygous for the reference allele (Brandt et al, 2015; Ros-Freixedes , 2018).…”
Section: Discussionmentioning
confidence: 94%
“…Ideally, such studies should include a high‐quality species‐specific reference genome, which is often not available for nonmodel organisms. Given the costs and time associated with generating a de novo reference genome, it can be more realistic to use an existing one, yet more distantly related, as is done most often in population genomic studies (Duchen & Salamin, 2021 ). Currently, several empirical studies have examined the impact of nonconspecific reference genomes in population genomics.…”
Section: Introductionmentioning
confidence: 99%
“…The impact of phylogenetic distance or reference choice on locus recovery and variant calling was not tested in the current study. For example, phylogenetic distance between samples and a given reference genome has been shown to differentially impact variant scoring depending on the chosen genotype caller (Duche and Salamin, 2020), a finding that certainly warrants additional study. In future applications of ISSRseq, users are encouraged to explore alternative SNP calling software or approaches for generating data matrices to be used in phylogenomic inference.…”
Section: Methodological Considerations Suggestions and Caveatsmentioning
confidence: 99%