2021
DOI: 10.1038/s41592-021-01299-w
|View full text |Cite
|
Sign up to set email alerts
|

Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads

Abstract: Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read based phasing. Third-generation nanopore sequence data has demonstrated a long read length, but current interpretation methods for its novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline PEPPER-Margin-DeepVariant that produces state… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

8
204
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 199 publications
(212 citation statements)
references
References 56 publications
8
204
0
Order By: Relevance
“…The literature often categorizes sequences consisting of one repeated base as “homopolymers”, and repeated sequences of at least two bases as “tandem repeats” or “copolymers” [10, 23]. Short tandem repeats (STRs) are often defined as repeated units 2-6 bases in length, and are also known as “microsatellites” or “simple sequence repeats” (SSRs) [3].…”
Section: Algorithmmentioning
confidence: 99%
See 2 more Smart Citations
“…The literature often categorizes sequences consisting of one repeated base as “homopolymers”, and repeated sequences of at least two bases as “tandem repeats” or “copolymers” [10, 23]. Short tandem repeats (STRs) are often defined as repeated units 2-6 bases in length, and are also known as “microsatellites” or “simple sequence repeats” (SSRs) [3].…”
Section: Algorithmmentioning
confidence: 99%
“…The two current leading nanopore variant callers are Clair3 (developed by the HKUCS Bioinformatics Algorithm Lab) and PEPPER-Margin-DeepVariant (a collaboration between UCSC and Google Health, hereafter referred to as PEPPER) [23, 31]. Both tools have converged on a similar variant calling pipeline: basecalling, read alignment, pileup-based variant calling (using pileup summary statistics), read phasing, and full-alignment variant calling (using all read information).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The polishCLR workflow will increase the efficiency of polishing many genomes and reduce the potential of human error in this multistep process. Despite the much-reduced error rate of PacBio HiFi and ONT reads, polishing approaches continue to be an important component of accurate genome assembly (Shafin et al, 2021). Although this pipeline was not designed to polish with ONT reads, the workflow is available on GitHub and welcomes any future contributions.…”
Section: Mainmentioning
confidence: 99%
“…We denote by F z a quantizer that assigns the value F z (x) = z to every quality score x. In our experiments we take z = 10, which is a common threshold for filtering low quality reads (see, e.g., [13]). In addition, since repetitive patterns of bases are particularly difficult to sequence by nanopore technologies [14], we evaluate quantizers where the quantization of a quality score x in a position i of a read depends not only on x but also on the bases called in positions close to i.…”
Section: Quantization Of Quality Scoresmentioning
confidence: 99%