2014
DOI: 10.1093/bioinformatics/btu391
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation and validation ofde novoand hybrid assembly techniques to derive high-quality genome sequences

Abstract: Motivation: To assess the potential of different types of sequence data combined with de novo and hybrid assembly approaches to improve existing draft genome sequences.Results: Illumina, 454 and PacBio sequencing technologies were used to generate de novo and hybrid genome assemblies for four different bacteria, which were assessed for quality using summary statistics (e.g. number of contigs, N50) and in silico evaluation tools. Differences in predictions of multiple copies of rDNA operons for each respective … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
95
0
1

Year Published

2015
2015
2018
2018

Publication Types

Select...
3
3
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 105 publications
(99 citation statements)
references
References 47 publications
3
95
0
1
Order By: Relevance
“…Both the short and long reads were assembled together (described as "hybrid assembly") using standard settings of SPAdes 3.5 (14), and yielded 47 linear contigs larger than 500 bp along with one circular contig of 35.5 kb (a candidate cyanobacterial plasmid). Hybrid assembly has previously been used to improve overall draft genome quality; however, in this case, it was still fragmented because large repeated regions remained unresolved (15). To resolve these regions and close the genome, we developed an approach that involved trimming the repetitive edges from the assembled contigs (which tend to have assembly mistakes) and then submitting these trimmed contigs to SSPACE-LongReads scaffolding with the standard settings (16).…”
Section: Resultsmentioning
confidence: 99%
“…Both the short and long reads were assembled together (described as "hybrid assembly") using standard settings of SPAdes 3.5 (14), and yielded 47 linear contigs larger than 500 bp along with one circular contig of 35.5 kb (a candidate cyanobacterial plasmid). Hybrid assembly has previously been used to improve overall draft genome quality; however, in this case, it was still fragmented because large repeated regions remained unresolved (15). To resolve these regions and close the genome, we developed an approach that involved trimming the repetitive edges from the assembled contigs (which tend to have assembly mistakes) and then submitting these trimmed contigs to SSPACE-LongReads scaffolding with the standard settings (16).…”
Section: Resultsmentioning
confidence: 99%
“…In addition, incorrect annotation or misannotation of coding sequences is common in the genome sequences deposited in the public databases (Schnoes et al, 2009). Moreover, recent high-throughput but short-read sequencing technologies (e.g., Illumina platforms) have limited power to resolve large repetitive regions, which can lead to substantial errors in the assembly of genome sequence data (Nagarajan and Pop, 2013;Utturkar et al, 2014). This is of particular relevance for the precise determination of nucleotide sequences around the inverted repeat regions of chloroplast genomes, which frequently contain ycf1 genes.…”
Section: Discrepancies and Errors In Annotation And Interpretation Ofmentioning
confidence: 99%
“…Among these is ALLPATHS-LG (Gnerre et al, 2011), arguably the winner of the so-called Assemblathon (Earl et al, 2011). ALLPATHS-LG uses the information provided by long fragments from paired-end and mate-pair sequencing to improve the assembly, and has therefore been shown to be one of the best assembly programs that are available today (Utturkar et al, 2014). However, because of the short fragments contained in aDNA samples, this approach is not feasible for aDNA samples and other methods have to be employed.…”
Section: Introductionmentioning
confidence: 99%