2019
DOI: 10.1186/s12864-019-5475-x
|View full text |Cite
|
Sign up to set email alerts
|

Improving the sensitivity of long read overlap detection using grouped short k-mer matches

Abstract: Background Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than second-generation sequencing technologies such as Illumina. The increased read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and characterize the intra-species variations. It also holds the promise to decipher the community structure in complex microbial communities because long reads help metagenomic assembly. One key step in genome… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 38 publications
(57 reference statements)
0
4
0
Order By: Relevance
“…In this step, in order to intuitively analyze the similarity between the reference sequence and the query sequence, reads with a long length and close to the region of the reference sequence are optimally selected and aligned based on the region of the reference sequence. During the first basic alignment step, only reads with overlaps between the reference sequence and query sequence are aligned [ 18 ]. Read alignment results in the reads overlapping in a specific region of the reference sequence, and an additional analysis step is needed to accurately select the reads most similar to the reference sequence.…”
Section: Methodsmentioning
confidence: 99%
“…In this step, in order to intuitively analyze the similarity between the reference sequence and the query sequence, reads with a long length and close to the region of the reference sequence are optimally selected and aligned based on the region of the reference sequence. During the first basic alignment step, only reads with overlaps between the reference sequence and query sequence are aligned [ 18 ]. Read alignment results in the reads overlapping in a specific region of the reference sequence, and an additional analysis step is needed to accurately select the reads most similar to the reference sequence.…”
Section: Methodsmentioning
confidence: 99%
“…Third generation : Single Molecule Real-Time (SMRT) sequencing method commercialized by Pacific Biosciences, produces 100 - 200 gigabases per single 20-hour run, with approximately 30000 bp read lengths ( Du et al , 2019 ). Despite its high throughput, SMRT lacks the raw sequence accuracy of pyrosequencing at 87% compared to 99% ( Du et al , 2019 ). The cost per one million bases is $10 compared to approximately $2400 for pyrosequencing ( Liu et al , 2012 ).…”
Section: Genome Sequencing and Genomic Data Analysismentioning
confidence: 99%
“…One approach is to use a small k-mer size and identify pairs (39) or groups (40) of them clustered tightly together, and it has been studied how to design the sampling distribution of seeds to optimize alignment sensitivity (41, 42). Multi-seed methods are robust to any mutation type and have shown to, e.g., improve overlap detection between long reads (43). However, they still match single k-mers individually and group them based on statistics after individual k-mer hits have been found.…”
Section: Introductionmentioning
confidence: 99%