SeqScreen: a biocuration platform for robust taxonomic and biological process characterization of nucleic acid sequences of interest

Albin, Dreycey; Muthu, Pravin; Godbold, Gene D.; Lindvall, Mikael; Diep, Madeline; Porter, Adam; Pop, Mihai; Ternus, Krista; Treangen, Todd J.; Nasko, Dan; Elworth, R. A. Leo; Lu, Jacob S.; Balaji, Advait; Diaz, Christian; Shah, Nidhi; Selengut, Jeremy D.; Hulme-Lowe, Chris

doi:10.1109/bibm47256.2019.8982987

Cited by 8 publications

(6 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another approach is to switch to using existing databases that apply stringent taxonomic standards in curation, such as NCBI , but doing so would drastically reduce coverage of variant sequences. Ultimately, however, it will likely be advisable to shift away from BLAST against NCBI and towards emerging methods that have been specifically tailored for pathogen identification, such as FAST-NA Scanner [24], ThreatSeq [25], SeqScreen [26], or SecureDNA [27].…”

Section: Discussionmentioning

confidence: 99%

Studying Pathogens Degrades BLAST-based Pathogen Identification

Beal

Clore

Manthey

2022

Preprint

View full text Add to dashboard Cite

As synthetic biology becomes increasingly capable and accessible, it is likewise increasingly critical to be able to make accurate biosecurity determinations regarding the pathogenicity or toxicity of particular nucleic acid or amino acid sequences. At present, this is typically done using the BLAST algorithm to determine the best match with sequences in the NCBI databases. Neither BLAST nor the NCBI databases, however, are actually designed for biosafety determination. Critically, taxonomic errors or ambiguities in the NCBI databases can also cause errors in BLAST-based taxonomic categorization. With heavily studied taxa and frequently used biotechnology tools, even low frequency taxonomic categorization issues can lead to high rates of errors in biosecurity decision-making. Here we focus on the implications for false positives, finding that NCBI BLAST will now incorrectly categorize a number of commonly used biotechnology tool sequences as the pathogens or toxins with which they have been used. Paradoxically, this implies that problems are expected to be most acute for the pathogens and toxins of highest interest and the most widely used biotechnology tools. We thus conclude that biosecurity tools should shift away from BLAST against NCBI and towards new methods that are specifically tailored for biosafety purposes.

show abstract

Section: Discussionmentioning

confidence: 99%

Studying Pathogens Degrades BLAST-based Pathogen Identification

Beal

Clore

Manthey

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Prior to running KOMB, we implemented a homogenizing step where only reads having length equal to the longest read length per sample were kept (mostly 100 bp) and the rest were discarded. Functional characterization of unitigs obtained and marked from the anomaly detection stage is done through SeqScreen [69] , [70] . Anomalous unitigs are determined by considering all unitigs whose dmp score (see Eq.…”

Section: Methodsmentioning

confidence: 99%

KOMB: K-core based de novo characterization of copy number variation in microbiomes

Balaji

Sapoval²,

Seto³

et al. 2022

Computational and Structural Biotechnology Journal

Self Cite

View full text Add to dashboard Cite

“…KOMB was then run with the parameter -k (kmer-size) 51. Functional characterization of unitigs obtained and marked from the anomaly detection stage is done through SeqScreen [52,53]. Anomalous unitigs are determined by considering all unitigs whose dmp score (see Equation 1) is above a cutoff score as determined in Equation 3.…”

Section: Shakya Synthetic Metagenomementioning

confidence: 99%

KOMB: Graph-Based Characterization of Genome Dynamics in Microbial Communities

Balaji

Sapoval

Seto

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Background: Taxonomic classification of microbiomes has provided tremendous insight into the underlying genome dynamics of microbial communities but has relied on known microbial genomes contained in curated reference databases.Methods: We propose K-core graph decomposition as a novel approach for tracking metagenome dynamics that is taxonomy-oblivious. K-core performs hierarchical decomposition which partitions the graph into shells containing nodes having degree at least K called K-shells, yielding O(E + V ) complexity.Results: The results of the paper are two-fold: (1) KOMB can identify homologous regions efficiently in metagenomes, (2) KOMB reveals community profiles that capture intra-and inter-genome dynamics, as supported by our results on simulated, synthetic, and real data. 1 2 Background 3Graph-based representations and analyses paved the way for several advances in 4 computational biology over the last few decades [1][2][3]. This is particularly evident 5 in the progress made in the field of genome assembly, both for isolate genome 6 assembly [4,5] and metagenome assembly, as well as efficient detection of struc-7 tural variants [6-8] using genome graphs [9][10][11]. Indeed, state-of-the-art graph-8 based metagenome assemblers [12-15] have achieved remarkable improvements in 9 both run-time and accuracy in recent years [16] through the use of efficient data 10 Balaji et al.Page 2 of 34 structures and clever heuristics. Recent examples include compact De Bruijn graph 11 construction and traversal for assembly [17,18] as well as scaffold graphs for metage-12 nomic samples that can generate scaffolds from contiguous overlapping sequences 13 77regions into various shells that can then be used to analyze genomic variation in 78 the sample. We show that the distribution of nodes could lead to a new method-79 ology that describes metagenomic community structure based on sample specific 80 signatures obtained from KOMB profiles. In Methods, found towards the end of 81 the manuscript, we describe the pipeline of the tool, explain unitig graph construc-82 tion, and elaborate on the concept of K-core decomposition. In the Results section, 83 we provide a rigorous validation of our novel K-core decomposition tool KOMB as 84 applied to unitig graphs constructed from simulated data as well as synthetic and 85 real metagenomes. We demonstrate its effectiveness in identifying repetitive regions 86 across sample types and sizes and illustrate how KOMB profiles can be used to 87 visualize community structure. Finally, in the Discussion and Conclusions we cover 88 the salient points and main conclusions from our study and lay out future directions 89 of our research. 90 Results 91We present a thorough validation of KOMB as applied to various simulated, syn-92 thetic, and real datasets. We do this through three major sets of experiments. 93First, we demonstrate the efficacy of the application of the K-core decomposition 94 algorithm in genomics by testing it on simulated genomes constructed as random 95 sequences to wh...

show abstract

SeqScreen: a biocuration platform for robust taxonomic and biological process characterization of nucleic acid sequences of interest

Cited by 8 publications

References 34 publications

Studying Pathogens Degrades BLAST-based Pathogen Identification

Studying Pathogens Degrades BLAST-based Pathogen Identification

KOMB: K-core based de novo characterization of copy number variation in microbiomes

KOMB: Graph-Based Characterization of Genome Dynamics in Microbial Communities

Contact Info

Product

Resources

About