2016
DOI: 10.1007/978-3-319-31957-5_11
|View full text |Cite
|
Sign up to set email alerts
|

Safe and Complete Contig Assembly Via Omnitigs

Abstract: Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs -a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question remains: given a genome graph G (e.g. a de Bruijn, or a string graph), what are all the strings that can be safely reported from G as contigs?… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 12 publications
(13 citation statements)
references
References 35 publications
0
13
0
Order By: Relevance
“…With this definition, a set of maximal non-branched paths can be derived from . Here, a maximal non-branch path indicates a path that meets the following conditions: (i) for the first vertex, the in-degree is 0 or >1, and the out-degree is 1; (ii) for the last vertex, the out-degree is 0 or >1, and the in-degree 1; (iii) for all the other internal vertices, the in- and out-degrees are exactly 1 ( Tomescu and Medvedev, 2016 ). Such paths are usually termed as ‘unipaths’ or ‘unitigs’, which are commonly used in the de novo assembly of genome ( Gnerre et al , 2011 ; Zimin et al , 2013 ; Tomescu and Medvedev, 2016 ).…”
Section: Methodsmentioning
confidence: 99%
“…With this definition, a set of maximal non-branched paths can be derived from . Here, a maximal non-branch path indicates a path that meets the following conditions: (i) for the first vertex, the in-degree is 0 or >1, and the out-degree is 1; (ii) for the last vertex, the out-degree is 0 or >1, and the in-degree 1; (iii) for all the other internal vertices, the in- and out-degrees are exactly 1 ( Tomescu and Medvedev, 2016 ). Such paths are usually termed as ‘unipaths’ or ‘unitigs’, which are commonly used in the de novo assembly of genome ( Gnerre et al , 2011 ; Zimin et al , 2013 ; Tomescu and Medvedev, 2016 ).…”
Section: Methodsmentioning
confidence: 99%
“…A set of maximal non-branched paths (unipaths) can be derived from D. Each of the unipaths meets the following conditions: i) for the first vertex, the indegree is 0 or >1, and the out-degree is 1; ii) for the last vertex, the outdegree is 0 or >1, and the in-degree 1; iii) for all the other vertices along the path, the in-and out-degrees are exactly 1. This definition follows previous studies (Tomescu and Medvedev, 2016), and the string implied by compacting a certain unipath is called a "unitig" (Gnerre et al, 2011;Zimin et al, 2013).…”
Section: Preliminarymentioning
confidence: 97%
“…The idea of this paper originates from a more general research question in bioinformatics about partial solutions common to all solutions. The typical example is the contig assembly problem, where contigs are those strings that are part of all possible genomic reconstructions from an assembly graph (be they Eulerian, Hamiltonian, or just node/ edge covering paths)-see e.g., [33]. Another example is the sequence alignment problem, where some studies consider reliable partial alignments [12], [16], [34], or base-pairings common to all optimal or almost-optimal alignments [4], [38].…”
Section: Previous Work On Safe Solutionsmentioning
confidence: 99%
“…In this paper we address the gap filling problem with the "safe and complete" framework proposed in [33] for the contig assembly problem. This framework defines an algorithm for a problem to be: (i) safe, if it returns only partial solutions that are common to all solutions to the problem (these common partial solutions are also called safe), and (ii) complete, if it returns all safe solutions.…”
Section: Introductionmentioning
confidence: 99%