2015
DOI: 10.1093/bioinformatics/btv053
|View full text |Cite
|
Sign up to set email alerts
|

Starcode: sequence clustering based on all-pairs search

Abstract: Motivation: The increasing throughput of sequencing technologies offers new applications and challenges for computational biology. In many of those applications, sequencing errors need to be corrected. This is particularly important when sequencing reads from an unknown reference such as random DNA barcodes. In this case, error correction can be done by performing a pairwise comparison of all the barcodes, which is a computationally complex problem.Results: Here, we address this challenge and describe an exact… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
180
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 202 publications
(180 citation statements)
references
References 17 publications
0
180
0
Order By: Relevance
“…Here, we used improved TRIP protocols and analysis tools (Zorita et al 2015) to systematically assay the magnitude of position effects on housekeeping promoters in the Drosophila genome. We obtained a data set of ∼85,000 insertions in Kc167 cells, which, to our knowledge, is the largest of this kind to date.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Here, we used improved TRIP protocols and analysis tools (Zorita et al 2015) to systematically assay the magnitude of position effects on housekeeping promoters in the Drosophila genome. We obtained a data set of ∼85,000 insertions in Kc167 cells, which, to our knowledge, is the largest of this kind to date.…”
Section: Discussionmentioning
confidence: 99%
“…Barcodes were clustered using Starcode (Zorita et al 2015), allowing two errors. Contaminant reads (where the barcode belongs to another promoter library) and barcodes with less than 100 reads were removed.…”
Section: Data Sets For Bioinformatic Analysesmentioning
confidence: 99%
“…DNA cleavage sites were identified computationally using a cleavage scoring system described in Supplemental Figure 1. The resulting multiplex Digenome-captured sites were classified into 11 groups by edit distance (Zorita et al 2015). The computer programs used for identification of in vitro RGEN cleavage sites and classification of these Digenome-captured sites by edit distance are available at our website (www.rgenome.net/digenome).…”
Section: Whole-genome and Digenome Sequencingmentioning
confidence: 99%
“…This resulted in 27,822,356 total reads after merging the runs together. Barcode read counts for S0 were generated by extracting the 20 bp sequence corresponding to the barcode region from the Read 2 sequences and using Starcode 33 to collapse barcodes within a Levenshtein distance of 1 (Supp. Fig.…”
Section: Dropsynth Barcode Designmentioning
confidence: 99%
“…The resulting 294 bp amplicon was size-selected using gel-extraction, purified, pooled, and loaded onto a Hiseq 2000 single-end 50 cycle run using custom sequencing primers mi4_R1 and mi4_index, resulting in 138 million total reads. The barcodes for each sample were clustered using Starcode 33 to collapse barcodes within a Levenshtein distance of 1 (Supp. Table 1).…”
Section: Complementation Assaymentioning
confidence: 99%