2020
DOI: 10.1093/nar/gkaa829
|View full text |Cite
|
Sign up to set email alerts
|

Sensitive alignment using paralogous sequence variants improves long-read mapping and variant calling in segmental duplications

Abstract: The ability to characterize repetitive regions of the human genome is limited by the read lengths of short-read sequencing technologies. Although long-read sequencing technologies such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies can potentially overcome this limitation, long segmental duplications with high sequence identity pose challenges for long-read mapping. We describe a probabilistic method, DuploMap, designed to improve the accuracy of long-read mapping in segmental duplications. I… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 56 publications
0
9
0
Order By: Relevance
“…4a ). However, it was challenging to identify the exact location of the duplicated region based on the sequencing data due to the short read length of Illumina sequencing 50 , 51 . We thus deleted the original region of PP_5050–PP_5242 in QP478, using the known sequences outside the region as homologous arms, to generate LC171, and deleted a portion of the duplication from PP_5084–PP_5242 to generate strain LC173.…”
Section: Resultsmentioning
confidence: 99%
“…4a ). However, it was challenging to identify the exact location of the duplicated region based on the sequencing data due to the short read length of Illumina sequencing 50 , 51 . We thus deleted the original region of PP_5050–PP_5242 in QP478, using the known sequences outside the region as homologous arms, to generate LC171, and deleted a portion of the duplication from PP_5084–PP_5242 to generate strain LC173.…”
Section: Resultsmentioning
confidence: 99%
“…Finally, we note that Vulcan could be used for any combination of long-read mappers that output the edit distance (NM tag) directly within sam/bam file output. Thus, allowing the inclusion of WinnowMap [ 39 ], LAST [ 21 ], LRA [ 23 ], or Duplomap [ 40 ] may further exploit our observation that variable gap costs for different read-to-reference mappings provide improved SV calling while offering improved runtimes compared to the more computationally expensive long-read mapping approaches.…”
Section: Discussionmentioning
confidence: 99%
“…Collapsed duplication FP regions and population CNV FP regions are both related to collapsed duplications errors in the GRCh38, recently identified and corrected by the T2T consortium [23]. These regions are populated with paralog-specific variants (PSV)-variation among paralogous sequences, which impact short-read variant calling [24]. In addition, the long segmental duplications with high sequence identity are a challenge for long-read mapping accuracy [24].…”
Section: Performance Of Various Stratification Regions On Ont-illuminamentioning
confidence: 99%
“…These regions are populated with paralog-specific variants (PSV)-variation among paralogous sequences, which impact short-read variant calling [24]. In addition, the long segmental duplications with high sequence identity are a challenge for long-read mapping accuracy [24]. Therefore, we found lower SNPs F1 scores using only ONT or Illumina data (collapsed duplication FP regions for either ONT or Illumina: 0.7797, 0.4263; population CNV FP regions: 0.9115, 0.7720).…”
Section: Performance Of Various Stratification Regions On Ont-illuminamentioning
confidence: 99%