2021
DOI: 10.1101/2021.05.03.440524
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Semi-supervised identification of SARS-CoV-2 molecular targets

Abstract: SARS-CoV-2 genomic sequencing efforts have scaled dramatically to address the current global pandemic and aid public health. In this work, we analyzed a corpus of 66,000 SARS-CoV-2 genome sequences. We developed a novel semi-supervised pipeline for automated gene, protein, and functional domain annotation of SARS-CoV-2 genomes that differentiates itself by not relying on use of a single reference genome and by overcoming atypical genome traits. Using this method, we identified the comprehensive set of known pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(6 citation statements)
references
References 32 publications
0
6
0
Order By: Relevance
“…Additionally, even though a complete analysis with the latest variants of concern was out of the scope for this paper, in light of the outbreak and seriousness of Delta variant of SARS-CoV-2, we got 2 frequently seen sequences of Spike glycoprotein for Delta variant and checked if our top Spike T-Cell and Spike B-Cell epitopes are present on this Spike sequence. The sequences were obtained from 36 and are included in Supplemental files SD1. As seen here 36 , these sequences have been found in other variants of concern as well.…”
Section: Results Verificationmentioning
confidence: 99%
See 4 more Smart Citations
“…Additionally, even though a complete analysis with the latest variants of concern was out of the scope for this paper, in light of the outbreak and seriousness of Delta variant of SARS-CoV-2, we got 2 frequently seen sequences of Spike glycoprotein for Delta variant and checked if our top Spike T-Cell and Spike B-Cell epitopes are present on this Spike sequence. The sequences were obtained from 36 and are included in Supplemental files SD1. As seen here 36 , these sequences have been found in other variants of concern as well.…”
Section: Results Verificationmentioning
confidence: 99%
“…For this study, we utilized ∼28K unique SARS-CoV-2 protein sequences and ∼49K related protein domains and annotations identified from a collection of ∼62K high quality SARS-CoV-2 genomes from NCBI GenBank 39 and GISAID 40 using the methods described in Beck, et al 36 (full protein and domain sequences can be found with that publication). The proteins and domains identified by FGP are related to the respective source's genome accession.…”
Section: Sars-cov-2 Genomic Datamentioning
confidence: 99%
See 3 more Smart Citations