2020
DOI: 10.1101/2020.02.11.943241
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Weighted minimizer sampling improves long read mapping

Abstract: Motivation:In this era of exponential data growth, minimizer sampling has become a standard algorithmic technique for rapid genome sequence comparison. This technique yields a sub-linear representation of sequences, enabling their comparison in reduced space and time. A key property of the minimizer technique is that if two sequences share a substring of a specified length, then they can be guaranteed to have a matching minimizer. However, because the k-mer distribution in eukaryotic genomes is highly uneven, … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
97
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 49 publications
(97 citation statements)
references
References 41 publications
0
97
0
Order By: Relevance
“…Nevertheless, post-processing using DuploMap achieved a higher recall (0.906) while maintaining a high precision (0.9954). We also evaluated Winnowmap ( 24 ), a long-read alignment tool that uses a weighted sampling-based method for selecting minimizers to improve long-read mapping using Minimap2 in long tandem repeats. However, Winnowmap’s recall and precision in Long-SegDups regions were lower than those for Minimap2 ( Supplementary Figure S3 ).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Nevertheless, post-processing using DuploMap achieved a higher recall (0.906) while maintaining a high precision (0.9954). We also evaluated Winnowmap ( 24 ), a long-read alignment tool that uses a weighted sampling-based method for selecting minimizers to improve long-read mapping using Minimap2 in long tandem repeats. However, Winnowmap’s recall and precision in Long-SegDups regions were lower than those for Minimap2 ( Supplementary Figure S3 ).…”
Section: Resultsmentioning
confidence: 99%
“…Recent work has shown that the accuracy of long-read mapping in extra-long tandem repeats in the human genome—typically found in centromeres—can be improved using specialized computational methods ( 23–25 ) that are designed to exploit the sequence and structure of long repeats. For example, the Winnowmap algorithm ( 24 ) modifies the sequence matching algorithm to avoid filtering out repeated k -mers that are common in tandem repeats ( 24 ).…”
Section: Introductionmentioning
confidence: 99%
“…The total length of HumanGenome after compressing all homopolymer runs is 2,133,004,165. The chrX dataset was generated by mapping the T2T dataset to HumanGenome using Winnowmap (Jain, 2020) and selecting reads that mapped to chrX. In rare cases when a read maps to multiple nearly identical instances of a repeat, Winnowmap…”
Section: Appendicesmentioning
confidence: 99%
“…Nevertheless, post-processing using DuploMap achieved a higher recall (0.906) while maintaining a high precision (0.9954). We also evaluated Winnowmap [24], a long-read alignment tool that uses a weighted sampling based method for selecting minimizers to improve long-read mapping using Minimap2 in long tandem repeats. However, Winnowmap's recall and precision in Long-SegDups regions were lower than those for Minimap2 (Supp.…”
Section: Evaluation Of Mapping Accuracy Using Simulated Readsmentioning
confidence: 99%
“…Long repeated sequences in the human genome result in multiple locations with high scores and pose problems for long-read alignment tools. Recent work has shown that the accuracy of long-read mapping in extra-long tandem repeats in the human genome -typically found in centromeres -can be improved using specialized computational methods [23,24,25] that are designed to exploit the sequence and structure of long repeats. For example, the Winnowmap algorithm [24] modifies the sequence matching algorithm to avoid filtering out repeated k-mers that are common in tandem repeats [24].…”
Section: Introductionmentioning
confidence: 99%