2020
DOI: 10.21203/rs.3.rs-32336/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Testing pipelines for genome-wide SNP calling from Genotyping-By-Sequencing (GBS) data for Pinus ponderosa

Abstract: Background Single Nucleotide Polymorphism (SNP) markers have rapidly gained popularity due to their abundance in most genomes and their amenability to high-throughput genotyping techniques. Reduced-representation restriction-enzyme-based sequencing methods (GBS or RADseq) have been demonstrated to be robust and cost-effective genotyping methods. While previous studies have shown that alignment of the short-read fragments to a genome sequence results in better SNP calling than de novo approaches, only a few tr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
6
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 42 publications
0
6
0
Order By: Relevance
“…There are several methods by which to detect such problematic sites, such as filtering by coverage (Dou et al, 2012 ), disomic models such as Hardy‐Weinberg proportions (Catchen et al, 2013 ; Chen et al, 2014 ; Hohenlohe et al, 2011 ), or gene annotation, though there are several shortcomings (see descriptions of these shortcomings in Table 1 of McKinney et al, 2017 ). When individual sequencing data is available for the same individuals or populations, such information can be used to isolate potentially paralogous sites from pool‐seq exome capture studies (e.g., Rellstab et al, 2019 ; Shu & Moran, 2020 ). However, a potentially cost‐saving alternative would be to sequence the haploid tissue of a single individual (if available).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…There are several methods by which to detect such problematic sites, such as filtering by coverage (Dou et al, 2012 ), disomic models such as Hardy‐Weinberg proportions (Catchen et al, 2013 ; Chen et al, 2014 ; Hohenlohe et al, 2011 ), or gene annotation, though there are several shortcomings (see descriptions of these shortcomings in Table 1 of McKinney et al, 2017 ). When individual sequencing data is available for the same individuals or populations, such information can be used to isolate potentially paralogous sites from pool‐seq exome capture studies (e.g., Rellstab et al, 2019 ; Shu & Moran, 2020 ). However, a potentially cost‐saving alternative would be to sequence the haploid tissue of a single individual (if available).…”
Section: Discussionmentioning
confidence: 99%
“…This is particularly true for pool‐seq data sets relying on read counts for allele frequency estimation or population genetic inferences such as genotype‐environment associations. While individually sequenced data sets may be one path forward to identifying such problematic sites (as in Rellstab et al, 2019 ; Shu & Moran, 2020 ), the sequencing of sufficient quantities of DNA from haploid gametophyte tissue available for some plants, including conifers, seedless vascular plants, and bryophytes, offers an alternate path forward to balance sequencing cost and data reliability, particularly for organism using diverged and or highly fragmented reference genomes.…”
Section: Discussionmentioning
confidence: 99%
“…This is particularly true for pool-seq datasets relying on read counts for allele frequency estimation or population genetic inferences such as genotype-environment association (e.g., as implemented in baypass, Gautier et al 2015). While individually sequenced datasets may be one path forward to identifying such problematic sites (as in Rellstab et al 2019, Shu & Moran 2020, it will be cost-prohibitive for large projects with many populations. As such, the sequencing of sufficient quantities of DNA from haploid gametophyte tissue available for some plants, including conifers,…”
Section: Discussionmentioning
confidence: 99%
“…For instance, conifers often have exceptionally large genomes (20–40 Gbp; Neale et al, 2017) with histories of whole‐genome duplication (Zheng et al, 2015), gene family expansion (Scott et al, 2020; De La Torre et al, 2014), transposable element dynamics (Scott et al, 2020; Wang et al, 2020; Yi et al, 2018), and extensive repeat regions (Wegrzyn et al, 2014). These complexities present a major challenge for NGS data analysis and downstream hypothesis testing in conifers (Lind et al, 2022; Shu & Moran, 2020). Such challenges can be alleviated by quantifying the accuracy of SNP calling when using the above‐mentioned variant calling tools.…”
Section: Introductionmentioning
confidence: 99%