2022
DOI: 10.1101/2022.11.20.517268
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Exome-wide benchmark of difficult-to-sequence regions using short-read next-generation DNA sequencing

Abstract: Next-generation DNA sequencing (NGS) in short-read mode has been recently used for genetic testing in various clinical settings. NGS data accuracy is crucial in clinical settings, and several reports regarding quality control of NGS data, focusing mostly on establishing NGS sequence read accuracy, have been published thus far. Variant calling is another critical source of NGS errors that remains mostly unexplored despite its established significance. In this study, we used a machine-learning-based method to es… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 24 publications
(35 reference statements)
0
3
0
Order By: Relevance
“…Unfortunately, none of the kits evaluated in this study excelled in covering these problematic areas although they didn't show a dramatic shift in coverage of GC content. Now it becomes more clear that well-designed probes kits are less affected by GC-bias and the problem of regions with low mapping quality is more related with short read sequencing (13,(17)(18)(19). We recommend that researchers consider these limitations when designing experiments, and also refer to the IGV visualization of difficult-to-sequence genes from Hijikata's article, which we found particularly helpful.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Unfortunately, none of the kits evaluated in this study excelled in covering these problematic areas although they didn't show a dramatic shift in coverage of GC content. Now it becomes more clear that well-designed probes kits are less affected by GC-bias and the problem of regions with low mapping quality is more related with short read sequencing (13,(17)(18)(19). We recommend that researchers consider these limitations when designing experiments, and also refer to the IGV visualization of difficult-to-sequence genes from Hijikata's article, which we found particularly helpful.…”
Section: Discussionmentioning
confidence: 99%
“…It has been previously identified that approximately 1Mb of the human exome can be skipped during sequencing (17,18). Recent research has precisely localized and described these difficult-to-sequence regions in exome data, which are mainly affected by low-mappability regions, such as pseudogenes, tandem repeats, homopolymers, and other low-complexity regions (19). While WGS increases the diagnostic yield over WES (20), understanding WES's challenges could drive improvements, minimize limitations and make it a more cost-effective procedure.…”
Section: Introductionmentioning
confidence: 99%
“…For example, 42% (47,315/113,696) of the high-confidence SVs occur fully outside of the GIAB Tier 1 regions, and visual inspection of 30 events confirmed the presence of an SV. We also identified 407 high-confidence SVs within coding regions defined as unreliable for variant identification using short-read sequencing based on analysis of gnomAD data (Hijikata et al 2024). In both cases, these SVs reside in regions that may be filtered by variant annotation pipelines.…”
Section: Structural Variation Within Medically Relevant Genesmentioning
confidence: 99%