2012
DOI: 10.1016/j.gpb.2012.05.006
|View full text |Cite
|
Sign up to set email alerts
|

Review of General Algorithmic Features for Genome Assemblers for Next Generation Sequencers

Abstract: In the realm of bioinformatics and computational biology, the most rudimentary data upon which all the analysis is built is the sequence data of genes, proteins and RNA. The sequence data of the entire genome is the solution to the genome assembly problem. The scope of this contribution is to provide an overview on the art of problem-solving applied within the domain of genome assembly in the next-generation sequencing (NGS) platforms. This article discusses the major genome assemblers that were proposed in th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
31
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 38 publications
(31 citation statements)
references
References 66 publications
0
31
0
Order By: Relevance
“…The layout helps in producing a consensus sequence, where each base in the sequence is identified by simple majority amongst the bases at that position or via some probabilistic approach. Therefore, this “Alignment-Layout-Consensus” paradigm is used by genome assemblers to infer the novel genome, [27-35]. …”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The layout helps in producing a consensus sequence, where each base in the sequence is identified by simple majority amongst the bases at that position or via some probabilistic approach. Therefore, this “Alignment-Layout-Consensus” paradigm is used by genome assemblers to infer the novel genome, [27-35]. …”
Section: Methodsmentioning
confidence: 99%
“…It begins the process by identifying a model, the “reference sequences”, most closely related to the set of reads. It then uses the set of reads to build on this model producing a model which overfits the data, the “novel genome”, [27,28,34,36-41]. The task of MDL is to identify the model that best describes the data and within comparative assembly framework the same meaning applies to finding the reference sequences that best describes the set of reads.…”
Section: Methodsmentioning
confidence: 99%
“…For an extensive literature on assemblers consult [54,[69][70][71][72][73]. The list of aligners is updated online [53].…”
Section: Platform-specific Biasesmentioning
confidence: 99%
“…N-gram based models have been widely used in natural language processing [11][12][13] and bioinformatics [14,15] due to their performance and ease of implementation. In this study, we only use uni-gram features and bi-gram features.…”
Section: B Using N-gram Models To Learn Associations Betweenmentioning
confidence: 99%