2020
DOI: 10.1101/2020.03.03.962365
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CoalQC - Quality control while inferring demographic histories from genomic data: Application to forest tree genomes

Abstract: Estimating demographic histories using genomic datasets has proven to be useful in addressing diverse evolutionary questions. Despite improvements in inference methods and availability of large genomic datasets, quality control steps to be performed prior to the use of sequentially Markovian coalescent (SMC) based methods remains understudied. While various filtering and masking steps have been used by previous studies, the rationale for such filtering and its consequences have not been assessed systematically… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 54 publications
0
4
0
Order By: Relevance
“…The remaining parameters were left as the default values used for humans (Li & Durbin, 2011), and we performed 100 bootstrap resamplings on all PSMC analyses to assess variance of the model. We also conducted PSMC after masking of repeat regions to control for potential bias due to variation in collapsed repeats (Patil et al., 2020), but because the removal of repeat regions resulted in significantly less data, the PSMC parameters could not be optimized for sufficient resolution in older time periods, resulting in higher variance for those intervals. The effects of collapsed repeats (expected to be less pronounced in a very complete genome like this one, where most repeats have been resolved) are expected to only change the Ne estimates in the most recent and oldest time periods (Patil et al., 2020).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The remaining parameters were left as the default values used for humans (Li & Durbin, 2011), and we performed 100 bootstrap resamplings on all PSMC analyses to assess variance of the model. We also conducted PSMC after masking of repeat regions to control for potential bias due to variation in collapsed repeats (Patil et al., 2020), but because the removal of repeat regions resulted in significantly less data, the PSMC parameters could not be optimized for sufficient resolution in older time periods, resulting in higher variance for those intervals. The effects of collapsed repeats (expected to be less pronounced in a very complete genome like this one, where most repeats have been resolved) are expected to only change the Ne estimates in the most recent and oldest time periods (Patil et al., 2020).…”
Section: Methodsmentioning
confidence: 99%
“…We also conducted PSMC after masking of repeat regions to control for potential bias due to variation in collapsed repeats (Patil et al., 2020), but because the removal of repeat regions resulted in significantly less data, the PSMC parameters could not be optimized for sufficient resolution in older time periods, resulting in higher variance for those intervals. The effects of collapsed repeats (expected to be less pronounced in a very complete genome like this one, where most repeats have been resolved) are expected to only change the Ne estimates in the most recent and oldest time periods (Patil et al., 2020). Our repeat‐masked plot looks nearly identical to the full‐genome plot in the most recent and middle time periods, and similar (but higher variance) in the oldest time periods (Figure S1).…”
Section: Methodsmentioning
confidence: 99%
“…tularosa was reconstructed using the cleaned Illumina paired‐end reads. To help prevent the inadvertent incorporation of nuclear translocations (NUMTs) into the mtDNA assembly, we first used coalqc (v.0.1; Patil et al 2020) and samtools (v.1.17; Li et al, 2009) to extract Illumina reads that aligned to the Devils Hole pupfish ( C . diabolis ) mtDNA assembly (NC_030345.1; Lema et al, 2016).…”
Section: Methodsmentioning
confidence: 99%
“…The complete mtDNA sequence of C. tularosa was reconstructed using the cleaned Illumina paired-end reads. To help prevent the inadvertent incorporation of nuclear translocations (NUMTs) into the mtDNA assembly, we first used coalqc (v.0.1; Patil et al 2020) and samtools (v.1.17; to extract Illumina reads that aligned to the Devils Hole pupfish (C. diabolis) mtDNA assembly (NC_030345.1; Lema et al, 2016). The resulting bam file was then converted back to fastq file format with bedtools (v.2.29.0; Quinlan & Hall, 2010) and used alongside the C. diabolis full mitogenome as a backbone for the C. tularosa mtDNA genome assembly with mitobim (v.1.8; Hahn et al, 2013).…”
Section: Quality Control and Genome Assembliesmentioning
confidence: 99%