2018
DOI: 10.1186/s12859-018-2208-0
|View full text |Cite
|
Sign up to set email alerts
|

NPBSS: a new PacBio sequencing simulator for generating the continuous long reads with an empirical model

Abstract: BackgroundPacBio sequencing platform offers longer read lengths than the second-generation sequencing technologies. It has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. Due to its extremely wide range of application areas, fast sequencing simulation systems with high fidelity are in great demand to facilitate the development and comparison of subsequent analysis tools. Although there are several available simulators (e.g., PBSIM, SimLoRD and FASTQ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 30 publications
(19 citation statements)
references
References 32 publications
0
19
0
Order By: Relevance
“…The E. coli MG1655 genome sequence (with the length of 4,614,652 bp) from NCBI (No. NC_000913.3) was downloaded and inputted to the NPBSS simulator [ 19 ] for generating the PacBio simulated reads with different error rates. As a result, 6 simulated datasets with 5, 10, 15, 20 25 and 30% error rates were generated.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…The E. coli MG1655 genome sequence (with the length of 4,614,652 bp) from NCBI (No. NC_000913.3) was downloaded and inputted to the NPBSS simulator [ 19 ] for generating the PacBio simulated reads with different error rates. As a result, 6 simulated datasets with 5, 10, 15, 20 25 and 30% error rates were generated.…”
Section: Resultsmentioning
confidence: 99%
“…In order to estimate the capability of smsMap for mapping reads that span structural variations (SVs), we used another simulation dataset from chr1 of NA12878 with SVs. The simulation dataset with SVs was generated by inserting 7 SVs (i.e., 3 insertions, 3 deletions and 1 inversion) from DGV [ 38 ] into the reference chr1 and using the NPBSS simulator [ 19 ] at 20x coverage. Among the simulated reads, a total of 185 reads cover the SVs breakpoints.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…No matter which approach is taken, the essential part is to have firsthand experience to select proper computational design and pipeline and to accurately interpret analyzed genome data. Due to its extensive range of analytical tools and application areas, employing an effective simulator (from the quality of raw reads to assembly evaluation) has become an essential step for benchmarking genomic and bioinformatics analyses [92][93][94]. In simulations, considering a (very) large number of datasets is generally not a problem, except when the analysis of each dataset is hugely computationally expensive (e.g., in the genome assembly stage).…”
Section: Step 7: Choose the Best Computational Design And Pipelinementioning
confidence: 99%
“…Simulators tailored on Pacific Bioscience SMRT sequencing are more common. In particular, the SimLoRD [17] can only simulate circular consensus sequence reads (CCS) while PaSS [18] and NPBSS [19] can also produce continuous long reads (CLR). Due to its complexity and rapid evolution, the Oxford Nanopore Sequencing simulation is more challenging and hence less common.…”
Section: Related Workmentioning
confidence: 99%