2022
DOI: 10.21203/rs.3.rs-1062190/v2
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

An explainable model of host genetic interactions linked to COVID-19 severity

Abstract: We employed a multifaceted computational strategy to identify the genetic factors contributing to increased risk of severe COVID-19 infection from a Whole Exome Sequencing (WES) dataset of a cohort of 2000 Italian patients. We coupled a stratified k-fold screening, to rank variants more associated with severity, with training of multiple supervised classifiers, to predict severity on the basis of screened features. Feature importance analysis from tree-based models allowed to identify a handful of 16 variants … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(13 citation statements)
references
References 12 publications
0
13
0
Order By: Relevance
“…Indeed, multiple studies have already been conducted identifying potential susceptibility loci in the human genome that may put patients at increased risk of death or other severe outcomes, including mutations in genes linked to immune response, blood clotting and mucus production [72][73][74][75]. In particular, a recent study using machine learning approaches such as XGBoost identified variants from whole exome sequencing associated with severe COVID-19 [76]. These data identified associations between age, gender, and 16 variants linked to immune system and inflammatory processes able to predict severe outcomes with high accuracy.…”
Section: Plos Onementioning
confidence: 99%
“…Indeed, multiple studies have already been conducted identifying potential susceptibility loci in the human genome that may put patients at increased risk of death or other severe outcomes, including mutations in genes linked to immune response, blood clotting and mucus production [72][73][74][75]. In particular, a recent study using machine learning approaches such as XGBoost identified variants from whole exome sequencing associated with severe COVID-19 [76]. These data identified associations between age, gender, and 16 variants linked to immune system and inflammatory processes able to predict severe outcomes with high accuracy.…”
Section: Plos Onementioning
confidence: 99%
“…We developed the HGSP model by combining trained decision tree-based models (Random Forest and XGBoost classifiers) from a 5-Fold CV split of the original problem dataset. See a prior study from Onoja et al, 19 for further details. The HGSP combined these models via an ensemble "VotingClassifier" approach from the "sklearn.ensemble" python library module to aggregate the individual classifiers based on their prediction probabilities (soft margin) of the outcome.…”
Section: Host Genetic Severity Predictor Model Developmentmentioning
confidence: 99%
“…We consider the following case studies for model validation using the 16 In this study, we utilized the HGSP model we developed from a prior study of training decision tree-based models (Random Forest and XGBoost classifiers) combined from across a 5-fold CV (see further details in Onoja et al, 19 ) The HGSP model voting classifier was developed from high-performance machine learning algorithms that have some interpretability abilities due to their recursive tree-based decision system. We used this approach rather than adopting a complex model such as a deep neural network model to minimize the risk of overfitting and avoid the black box approximation of the problem considering their internal model mechanisms are difficult to interpret.…”
Section: External Model Validationmentioning
confidence: 99%
See 2 more Smart Citations