2018
DOI: 10.1093/bioinformatics/bty597
|View full text |Cite
|
Sign up to set email alerts
|

A fast adaptive algorithm for computing whole-genome homology maps

Abstract: MotivationWhole-genome alignment is an important problem in genomics for comparing different species, mapping draft assemblies to reference genomes and identifying repeats. However, for large plant and animal genomes, this task remains compute and memory intensive. In addition, current practical methods lack any guarantee on the characteristics of output alignments, thus making them hard to tune for different application requirements.ResultsWe introduce an approximate algorithm for computing local alignment bo… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
137
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 140 publications
(137 citation statements)
references
References 39 publications
0
137
0
Order By: Relevance
“…HLL lacks another advantage of MinHash; when Min-Hash is used in conjunction with a reversible hash function, it can be used not only to calculate the relevant set cardinalities but also to report the k-mers common between the sets. This can provide crucial hints when the eventual goal is to map a read to (or near) its point of origin with respect to the reference, as is the goal for tools like MashMap [5].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…HLL lacks another advantage of MinHash; when Min-Hash is used in conjunction with a reversible hash function, it can be used not only to calculate the relevant set cardinalities but also to report the k-mers common between the sets. This can provide crucial hints when the eventual goal is to map a read to (or near) its point of origin with respect to the reference, as is the goal for tools like MashMap [5].…”
Section: Discussionmentioning
confidence: 99%
“…Since the release of the seminal Mash tool [1], data sketches such as MinHash have become instrumental in comparative genomics. They are used to cluster genomes from large databases [1], search for datasets with certain sequence content [2], accelerate the overlapping step in genome assemblers [3,4], map sequencing reads [5], and find similarity thresholds characterizing species-level distinctions [6]. Whereas MinHash was originally developed to find similar web pages [7], here it is being used to summarize large genomic sequence collections such as reference genomes or sequencing datasets.…”
Section: Introductionmentioning
confidence: 99%
“…To obtain similar sequences within a reference, we mapped the spliced transcript sequences against a version of the genome where all exon segments were hard-masked (i.e., replaced with N). We performed this mapping using MashMap [20], with a segment size 500 and minimum percent identity of 80%. The sequence similar regions were merged (per-chromosome) using BedTools [45] and concatenated, giving a decoy sequence for each chromosome.…”
Section: Decoy Sequencesmentioning
confidence: 99%
“…We also attempt to address one of the failure modes of direct alignment against the transcriptome, compared to spliced alignment to the genome: when a sequenced fragment originates from an unannotated genomic locus bearing sequence similarity to an annotated transcript, it can be falsely mapped to the annotated transcript since the relevant genomic sequence is not available to the method. We describe a procedure that makes use of MashMap [20] to identify and extract such sequence similar decoy regions from the genome. The normal Salmon index is then augmented with these decoy sequences, which are handled in a special manner during mapping and alignment scoring, leading to a reduction in such cases of false mappings.…”
Section: Introductionmentioning
confidence: 99%
“…[77]. Wholegenome alignment was computed by MashMap (https://github.com/marbl/MashMap) employing default settings, and was visualized as a dot plot [78].…”
Section: Genome Assembly Analysis Of Genomic Features and Synteny Comentioning
confidence: 99%