2021
DOI: 10.1016/j.patter.2020.100178
|View full text |Cite
|
Sign up to set email alerts
|

SIMON: Open-Source Knowledge Discovery Platform

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
2

Relationship

5
5

Authors

Journals

citations
Cited by 17 publications
(10 citation statements)
references
References 54 publications
0
10
0
Order By: Relevance
“…We also applied machine learning (ML) to define clinical and PB protein markers that predict disease outcome (recovered vs fatal) or progression (early vs late death). First, the combined clinical and PB dataset was partitioned in training, validation and test sets for building models using 171 ML algorithms in SIMON (Sequential Iterative Modeling ‘‘Over Night’’) (Tomic et al, 2019; Tomic et al, 2021) to identify the best performing ML predictive models and order the best clinical/PB parameters for prediction by feature selection. Based on metrics of performance, we selected the random forest (RF) model and sorted the clinical/PB parameters in descending order by their permutation-based importance (mean decrease accuracy or increase out-of-bag error), averaged over 50 RF runs ( Figures S2C-F ) (Ganggayah et al, 2019; Jiang, 2020; Ludemann et al, 2006; Speiser et al, 2019; Tuleau-Malot, 2022; Wickham, 2020; Wiener, 2002).…”
Section: Resultsmentioning
confidence: 99%
“…We also applied machine learning (ML) to define clinical and PB protein markers that predict disease outcome (recovered vs fatal) or progression (early vs late death). First, the combined clinical and PB dataset was partitioned in training, validation and test sets for building models using 171 ML algorithms in SIMON (Sequential Iterative Modeling ‘‘Over Night’’) (Tomic et al, 2019; Tomic et al, 2021) to identify the best performing ML predictive models and order the best clinical/PB parameters for prediction by feature selection. Based on metrics of performance, we selected the random forest (RF) model and sorted the clinical/PB parameters in descending order by their permutation-based importance (mean decrease accuracy or increase out-of-bag error), averaged over 50 RF runs ( Figures S2C-F ) (Ganggayah et al, 2019; Jiang, 2020; Ludemann et al, 2006; Speiser et al, 2019; Tuleau-Malot, 2022; Wickham, 2020; Wiener, 2002).…”
Section: Resultsmentioning
confidence: 99%
“…Tailored ML frameworks and platforms that account for the idiosyncrasies of the underlying data have been published for applications in genomics 39,40 , proteomics 41,42 , biomedicine 43 , and chemistry 44 . Their creation recognizes the infeasibility to define, implement, and train appropriate ML models by relying solely on generic ML frameworks such as scikit-learn 45 or PyTorch 46 .…”
Section: Introductionmentioning
confidence: 99%
“…Pairwise comparisons within groups were performed using Wilcoxon matched-pairs signed-rank tests. Principal component analysis (PCA) was performed using SIMON software version 0.2.1 (https://genular.org) (37). Before PCA was performed, the data was pre-processed (center/scale), missing values were median imputed and variables with fewer than 5 unique values were removed.…”
Section: Discussionmentioning
confidence: 99%