Handling Sparsity with Random Forests When Predicting Adverse Drug Events from Electronic Health Records

Karlsson, Isak; Boström, Henrik

doi:10.1109/ichi.2014.10

Cited by 17 publications

(24 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The number of variables that randomly sampled as candidates at each split (mtry) and the number of trees to grow (ntree) were tuned (grid search) using the caret package (Kuhn, 2008) to obtain the optimal predictive ability, stability and accuracy. RF analysis was performed separately on the PTR‐ToF‐MS and GC‐MS datasets, to avoid the possible influence of scale difference and sparsity difference (Karlsson & Boström, 2014).…”

Section: Methodsmentioning

confidence: 99%

Sniffing fungi – phenotyping of volatile chemical diversity inTrichodermaspecies

et al. 2020

View full text Add to dashboard Cite

Volatile organic compounds (VOCs) play vital roles in the interaction of fungi with plants and other organisms. A systematic study of the global fungal VOC profiles is still lacking, though it is a prerequisite for elucidating the mechanisms of VOC-mediated interactions. Here we present a versatile system enabling a high-throughput screening of fungal VOCs under controlled temperature. In a proof-of-principle experiment, we characterized the volatile metabolic fingerprints of four Trichoderma spp. over a 48 h growth period.The developed platform allows automated and fast detection of VOCs from up to 14 simultaneously growing fungal cultures in real time. The comprehensive analysis of fungal odors is achieved by employing proton transfer reaction-time of flight-MS and GC-MS. The data-mining strategy based on multivariate data analysis and machine learning allows the volatile metabolic fingerprints to be uncovered.Our data revealed dynamic, development-dependent and extremely species-specific VOC profiles from the biocontrol genus Trichoderma. The two mass spectrometric approaches were highly complementary to each other, together revealing a novel, dynamic view to the fungal VOC release.This analytical system could be used for VOC-based chemotyping of diverse small organisms, or more generally, for any in vivo and in vitro real-time headspace analysis.

show abstract

Section: Methodsmentioning

confidence: 99%

Sniffing fungi – phenotyping of volatile chemical diversity inTrichodermaspecies

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Logistic regression (used by Gilens and Page) includes an intrinsic, strong assumption of linearity (Pampel, 2000) (for details see Appendix). Random Forests (RFs) in contrast are flexible, non-linear classifiers (Breiman, 2001a) that can handle large numbers of sparsely-represented features such as the preferences of the 43 individual IGs in the Gilens dataset (Karlsson, 2014). In addition, RFs have natural metrics to assess the relative importance of each feature.…”

Section: Gilens and Page Applied Classical Methods Of Statistics And mentioning

confidence: 99%

“…The final prediction of a test case is an average of the trees' predictions of that case. RFs are adept at handling sparse datasets with a large number of features (Karlsson, 2014), though removing uninformative features can improve model accuracy. The accuracies of two other flexible classifiers, XGBoost (Chen and Guestrin, 2016) (a flavor of RF) and Neural Nets, were similar to standard RFs (results not reported).…”

Section: Random Forest Modelsmentioning

confidence: 99%

“…In order to identify the most important IGs, we used RF's standard Gini impurity method (Breiman and Friedman, 1984), indicated for this dataset because the features were sparse, non-categorical, and similarly scaled (in range [-2, 2]) (Karlsson, 2014). Because IGs were ranked for each Policy Domain separately, the PDs were not features.…”

Section: Feature Selection Detailsmentioning

confidence: 99%

“…Note that because this feature set combines one-hot and ordinal value encodings, the caveats in (Karlsson, 2014) and (Strobl et al, 2007) about distortion of feature rankings may apply. As noted previously, the RF scores for the individual PAs have the balanced accuracy target encoded into their RF scores, which effectively avoids the status quo bias intrinsic to the different policy areas.…”

Section: Setupmentioning

confidence: 99%

See 2 more Smart Citations

Predicting United States Policy Outcomes with Random Forests

McGuire

Delahunt

2020

INET Working Paper Series

View full text Add to dashboard Cite

Two decades of U.S. government legislative outcomes, as well as the policy preferences of high-income people, the general population, and diverse interest groups, were captured in a detailed dataset curated and analyzed by Gilens, Page et al. (2014). They found that the preferences of high-income earners correlated strongly with policy outcomes, while the preferences of the general population did not, except via a linkage with the preferences of high earners. Their analysis applied the tools of classical statistical inference, in particular logistic regression. In this paper we analyze the Gilens dataset using the complementary tools of Random Forest classifiers (RFs), from Machine Learning. We present two primary findings, concerning respectively prediction and inference: (i) Holdout test sets can be predicted with approximately 70% balanced accuracy by models that consult only the preferences of those in the 90th income percentile and a small number of powerful interest groups, as well as policy area labels. These results include retrodiction, where models trained on pre-1997 cases predicted “future” (post-1997) cases. The 20% gain in accuracy over baseline (chance), in this detailed but noisy dataset, indicates the high importance of a few distinct players in U.S. policy outcomes, and aligns with a body of research indicating that the U.S. government has significant plutocratic tendencies. (ii) The feature selection methods of RF models identify especially salient subsets of interest groups (economic players). These can be used to further investigate the dynamics of governmental policy making, and also offer an example of the potential value of RF feature selection methods for inference on datasets such as this one.

show abstract

Machine learning models to detect and predict patient safety events using electronic health records: A systematic review

Deimazar,

Sheikhtaheri

2023

International Journal of Medical Informatics

View full text Add to dashboard Cite

Handling Sparsity with Random Forests When Predicting Adverse Drug Events from Electronic Health Records

Cited by 17 publications

References 16 publications

Sniffing fungi – phenotyping of volatile chemical diversity inTrichodermaspecies

Sniffing fungi – phenotyping of volatile chemical diversity inTrichodermaspecies

Predicting United States Policy Outcomes with Random Forests

Machine learning models to detect and predict patient safety events using electronic health records: A systematic review

Contact Info

Product

Resources

About