2019
DOI: 10.1101/557314
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Mash Screen: High-throughput sequence containment estimation for genome discovery

Abstract: The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here we describe an online algorithm capable of measuring the containment of genomes and proteomes within either assembled or unassembled sequencing read sets. We describe several use cases, including contamination screening and retrospective analysis of metagenomes for novel genome discovery. Using this t… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
67
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 54 publications
(67 citation statements)
references
References 28 publications
0
67
0
Order By: Relevance
“…Because the PLASMIDOME data set did not contain information about the reference plasmids, we generated some references for this data set by mapping the assembled PLASMIDOME contigs against the plasmid database (with Mash screen [Ondov et al 2019], QUAST [Gurevich et al 2013], and BLAST). This analysis revealed 10 reference plasmids with a total length of ≈100 kb.…”
Section: Analyzing the Plasmidome Data Setmentioning
confidence: 99%
“…Because the PLASMIDOME data set did not contain information about the reference plasmids, we generated some references for this data set by mapping the assembled PLASMIDOME contigs against the plasmid database (with Mash screen [Ondov et al 2019], QUAST [Gurevich et al 2013], and BLAST). This analysis revealed 10 reference plasmids with a total length of ≈100 kb.…”
Section: Analyzing the Plasmidome Data Setmentioning
confidence: 99%
“…From these, we selected n=726 plasmids which contained an IncF replicon after classification with MOB-typer (see below). We searched all plasmids against PLSDB (version 2020-03-04) [33] which contains 20,668 complete published plasmids, using mash screen [34] and keeping the top hit. All plasmids had a match.…”
Section: Discussionmentioning
confidence: 99%
“…To date, a total of 102 polyomaviruses have been reported around the world, 13 of which are human polyomaviruses (HPyVs), including BKPyV and JCPyV identi ed in the early 1970s, and 11 types of HPyV detected since 2007, including KIPyV, WUPyV, and MCPyV [2]. Additionally, Quebec polyomavirus (QPyV) discovered recently from human fecal samples through MinHash algorithm has not yet be included by ICTV [22]. HPyVs can cause diseases of the nervous system, hematopoietic system, urogenital tract, and skin [6], with two main characteristics.…”
Section: Discussionmentioning
confidence: 99%