2022
DOI: 10.3390/biology11030418
|View full text |Cite
|
Sign up to set email alerts
|

PWM2Vec: An Efficient Embedding Approach for Viral Host Specification from Coronavirus Spike Sequences

Abstract: The study of host specificity has important connections to the question about the origin of SARS-CoV-2 in humans which led to the COVID-19 pandemic—an important open question. There are speculations that bats are a possible origin. Likewise, there are many closely related (corona)viruses, such as SARS, which was found to be transmitted through civets. The study of the different hosts which can be potential carriers and transmitters of deadly viruses to humans is crucial to understanding, mitigating, and preven… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
23
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 42 publications
(23 citation statements)
references
References 76 publications
0
23
0
Order By: Relevance
“…Converting input data into fixed-length numerical vectors for applying different machine learning algorithms such as classification and clustering is a common practice across numerous fields like smart grid [14], [15], graph analytics [16], [17], [18], [19], [20], electromyography [21], clinical data analysis [22], network security [23], and text classification [24]. Authors in [5] use the position weight matrix-based approach to compute feature embeddings for spike sequences. Although their approach shows promising results, one drawback of their method is that it only works for aligned data.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Converting input data into fixed-length numerical vectors for applying different machine learning algorithms such as classification and clustering is a common practice across numerous fields like smart grid [14], [15], graph analytics [16], [17], [18], [19], [20], electromyography [21], clinical data analysis [22], network security [23], and text classification [24]. Authors in [5] use the position weight matrix-based approach to compute feature embeddings for spike sequences. Although their approach shows promising results, one drawback of their method is that it only works for aligned data.…”
Section: Related Workmentioning
confidence: 99%
“…See Figure 1 for an illustration of the SARS-CoV-2 genomic structure, including the region (the spike region) that codes the spike protein. It is hence important to use this source of information to identify different host specificity [5] and variants [4]. This motivates approaches for classifying coronavirus spike sequences to better understand the dynamics of the different variants in terms of this information.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The PSSM2Vec embedding is based on the idea of Position Specific Scoring Matrix (PSSM), also called position weight matrix (PWM) [38,39]. For a given nucleotide sequence s, PSSM2Vec designs the PWM.…”
Section: Pssm2vecmentioning
confidence: 99%
“…Fast and efficient solutions to the clade assignment problem would help in tracking current and evolving strains and it is crucial for the surveillance of the pathogen. This classification problem has been attacked with machine learning approaches [3, 4, 5] using the Spike protein amino acid sequence to drive the classification step.…”
Section: Introductionmentioning
confidence: 99%