2014 IEEE International Conference on Healthcare Informatics 2014
DOI: 10.1109/ichi.2014.10
|View full text |Cite
|
Sign up to set email alerts
|

Handling Sparsity with Random Forests When Predicting Adverse Drug Events from Electronic Health Records

Abstract: When using electronic health record (EHR) data to build models for predicting adverse drug effects (ADEs), one is typically facing the problem of data sparsity, i.e., drugs and diagnosis codes that could be used for predicting a certain ADE are absent for most observations. For such tasks, the ability to effectively handle sparsity by the employed machine learning technique is crucial. The state-of-the-art random forest algorithm is frequently employed to handle this type of data. It has however recently been … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(24 citation statements)
references
References 16 publications
0
24
0
Order By: Relevance
“…The number of variables that randomly sampled as candidates at each split (mtry) and the number of trees to grow (ntree) were tuned (grid search) using the caret package (Kuhn, 2008) to obtain the optimal predictive ability, stability and accuracy. RF analysis was performed separately on the PTR‐ToF‐MS and GC‐MS datasets, to avoid the possible influence of scale difference and sparsity difference (Karlsson & Boström, 2014).…”
Section: Methodsmentioning
confidence: 99%
“…The number of variables that randomly sampled as candidates at each split (mtry) and the number of trees to grow (ntree) were tuned (grid search) using the caret package (Kuhn, 2008) to obtain the optimal predictive ability, stability and accuracy. RF analysis was performed separately on the PTR‐ToF‐MS and GC‐MS datasets, to avoid the possible influence of scale difference and sparsity difference (Karlsson & Boström, 2014).…”
Section: Methodsmentioning
confidence: 99%
“…Logistic regression (used by Gilens and Page) includes an intrinsic, strong assumption of linearity (Pampel, 2000) (for details see Appendix). Random Forests (RFs) in contrast are flexible, non-linear classifiers (Breiman, 2001a) that can handle large numbers of sparsely-represented features such as the preferences of the 43 individual IGs in the Gilens dataset (Karlsson, 2014). In addition, RFs have natural metrics to assess the relative importance of each feature.…”
Section: Gilens and Page Applied Classical Methods Of Statistics And mentioning
confidence: 99%
“…The final prediction of a test case is an average of the trees' predictions of that case. RFs are adept at handling sparse datasets with a large number of features (Karlsson, 2014), though removing uninformative features can improve model accuracy. The accuracies of two other flexible classifiers, XGBoost (Chen and Guestrin, 2016) (a flavor of RF) and Neural Nets, were similar to standard RFs (results not reported).…”
Section: Random Forest Modelsmentioning
confidence: 99%
See 2 more Smart Citations