2021
DOI: 10.1101/2021.06.29.450335
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Classifying COVID-19 variants based on genetic sequences using deep learning models

Abstract: The COrona VIrus Disease (COVID-19) pandemic led to the occurrence of several variants with time. This has led to an increased importance of understanding sequence data related to COVID-19. In this chapter, we propose an alignment-free k-mer based LSTM (Long Short-Term Memory) deep learning model that can classify 20 different variants of COVID-19. We handle the class imbalance problem by sampling a fixed number of sequences for each class label. We handle the vanishing gradient problem in LSTMs arising from l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(8 citation statements)
references
References 46 publications
0
8
0
Order By: Relevance
“…For example, we can extract k -mer (short subsequence) frequencies or other combinations of bases/amino acids and use them as features to train classifiers using naive Bayes classifier (NBC), support vector machines (SVM), decision tree-based methods, and neural networks ( 43 49 ). Machine learning with k -mer features has been used for SARS-CoV-2 to identify genetic fingerprints of specific infections ( 50 ), classify variants ( 51 , 52 ), and train a model to predict the pathogenicity of unknown viruses ( 53 ). Another approach is to build profile hidden Markov models (HMMs), which can identify taxonomic lineages and variants of viruses.…”
Section: Can Deep Sequence Learning Help?mentioning
confidence: 99%
“…For example, we can extract k -mer (short subsequence) frequencies or other combinations of bases/amino acids and use them as features to train classifiers using naive Bayes classifier (NBC), support vector machines (SVM), decision tree-based methods, and neural networks ( 43 49 ). Machine learning with k -mer features has been used for SARS-CoV-2 to identify genetic fingerprints of specific infections ( 50 ), classify variants ( 51 , 52 ), and train a model to predict the pathogenicity of unknown viruses ( 53 ). Another approach is to build profile hidden Markov models (HMMs), which can identify taxonomic lineages and variants of viruses.…”
Section: Can Deep Sequence Learning Help?mentioning
confidence: 99%
“…Many models are developed to predict a possible infection of patients only by the reported symptoms, as the review of Huang et al shows ( 3 ). For example, Manni et al ( 4 ) developed a logistic regression to distinguish between index and contact persons and Drew et al ( 5 ) published an app which revealed symptom combinations which are predictive for a positive COVID-19 test. Spinato et al ( 6 ) created and validated a differentiated questionnaire to identify index cases based on symptomatology.…”
Section: Related Workmentioning
confidence: 99%
“…[82] Another work applied Long-Short Term Memory RNNs (LSTM) to classify sequences to lineages. [83]…”
Section: Background and Related Workmentioning
confidence: 99%
“…[82] Another work applied Long-Short Term Memory RNNs (LSTM) to classify sequences to lineages. [83] While deep learning methods can identify complex features within data that allow classification, that strength comes with a major weakness: understanding what the deep model focused on in learning the classification and explaining its predictions. Accordingly, interpretable machine learning has emerged as a significant area of research.…”
Section: Evolution Of Coronavirus Lineages and The Sars-cov-2 Spike P...mentioning
confidence: 99%