2012
DOI: 10.1371/journal.pone.0042304
|View full text |Cite
|
Sign up to set email alerts
|

An Integrated Pipeline for de Novo Assembly of Microbial Genomes

Abstract: Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi-step process, involving several stages of sequence data cleaning, error correction, assembly, and quality control. Successful application of these steps usually requires intimate knowledge of a diverse set of algor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
344
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 412 publications
(344 citation statements)
references
References 32 publications
0
344
0
Order By: Relevance
“…In order to detect polymorphisms in regions not present in the H37Rv reference strain, we constructed a custom-made reference sequence. Briefly, the reference strain for polymorphism calling was custom made by joining the full sequence of M. tuberculosis reference strain H37Rv (GenBank accession number NC_018143.1), and contigs were de novo assembled using the A5 pipeline (14) from all short reads of a randomly selected Manila isolate that did not align to the H37Rv reference strain, as determined by the Mosaik assembler (https://code.google.com/p/mosaik-aligner/). These contigs were concatenated after the H37Rv sequence using the string NNNNNCATTCCATTCATTAATTAATTAATGAAT GAATGNNNNN as a separator.…”
mentioning
confidence: 99%
“…In order to detect polymorphisms in regions not present in the H37Rv reference strain, we constructed a custom-made reference sequence. Briefly, the reference strain for polymorphism calling was custom made by joining the full sequence of M. tuberculosis reference strain H37Rv (GenBank accession number NC_018143.1), and contigs were de novo assembled using the A5 pipeline (14) from all short reads of a randomly selected Manila isolate that did not align to the H37Rv reference strain, as determined by the Mosaik assembler (https://code.google.com/p/mosaik-aligner/). These contigs were concatenated after the H37Rv sequence using the string NNNNNCATTCCATTCATTAATTAATTAATGAAT GAATGNNNNN as a separator.…”
mentioning
confidence: 99%
“…Polymorphisms were called against the reference pseudochromosomes using a variant ascertainment algorithm (VAAL) (31). The A5 pipeline was used for de novo assembly of newly sequenced GBS strains (32). Contigs Ͼ100 nucleotides in length were then used to search the NCBI nonredundant database using BLAST (33).…”
Section: Methodsmentioning
confidence: 99%
“…The multiplexed sequencing reads were parsed and barcode information was removed using onboard software. The A5 pipeline was used for de novo assembly of newly sequenced GAS strains (27). The trimmed and untrimmed emm type reference databases were downloaded directly from CDC ftp sites (ftp://ftp.cdc.gov/pub/infectious_diseases/biotech/tsemm and ftp://ftp.cdc.gov/pub/infectious_diseases/biotech/emmsequ, respectively).…”
Section: Methodsmentioning
confidence: 99%