2020
DOI: 10.3390/genes11010050
|View full text |Cite
|
Sign up to set email alerts
|

Consensify: A Method for Generating Pseudohaploid Genome Sequences from Palaeogenomic Datasets with Reduced Error Rates

Abstract: A standard practise in palaeogenome analysis is the conversion of mapped short read data into pseudohaploid sequences, frequently by selecting a single high-quality nucleotide at random from the stack of mapped reads. This controls for biases due to differential sequencing coverage, but it does not control for differential rates and types of sequencing error, which are frequently large and variable in datasets obtained from ancient samples. These errors have the potential to distort phylogenetic and population… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
9

Relationship

3
6

Authors

Journals

citations
Cited by 20 publications
(4 citation statements)
references
References 51 publications
0
4
0
Order By: Relevance
“…In addition to being able to analyze unphased data, our method can also take a form of pseudo-haploid data as input. Generating pseudo-haploid data is a strategy often applied to low-coverage sequencing data, where reliable diploid genotype calls are not feasible, and may introduce unwanted biases, for example in ancient human DNA studies [ 42 ]. In pseudo-haploid data, at each SNP, one sequencing read covering the respective SNP is chosen uniformly at random, and the allele on this read is then reported as the haploid genotype for the individual.…”
Section: Resultsmentioning
confidence: 99%
“…In addition to being able to analyze unphased data, our method can also take a form of pseudo-haploid data as input. Generating pseudo-haploid data is a strategy often applied to low-coverage sequencing data, where reliable diploid genotype calls are not feasible, and may introduce unwanted biases, for example in ancient human DNA studies [ 42 ]. In pseudo-haploid data, at each SNP, one sequencing read covering the respective SNP is chosen uniformly at random, and the allele on this read is then reported as the haploid genotype for the individual.…”
Section: Resultsmentioning
confidence: 99%
“…We did not obtain the sequencing of Pg1, Pg5, and Pgx1, which might be due to sequencing errors. The sequencing errors lead to a low sequencing rate and/or no sequencing that could be due to high heterozygosity in genetic regions, homopolymeric sequences, runs of G or C in the regions, stops in the regions, and polymerase slippage during the Sanger sequencing method [ 26 , 27 , 28 , 29 ]. The sequences deposited in GenBank were assigned the accession numbers given in Supplementary Table S3 .…”
Section: Resultsmentioning
confidence: 99%
“…As input fasta files for D statistics ( Green et al 2010 ; Durand et al 2011 ), the nuclear diversity estimates and Bayesian molecular dating, we generated pseudohaploid consensus sequences for all specimens using Consensify ( Barlow et al 2020 ). We generated the base count input file using ANGSD v0.923 ( Korneliussen et al 2014 ; -minQ 30, -minMapQ 30, -uniqueonly 1, -remove_bads 1, -baq 2, -dumpCounts 3, -C 0, -only_proper_pairs 0, -doCounts 1, -trim 0).…”
Section: Methodsmentioning
confidence: 99%