Counting trees in Random Forests: Predicting symptom severity in psychiatric intake reports

Scheurwegs, Elyne; Sushil, Madhumita; Tulkens, Stéphan; Daelemans, Walter; Luyckx, Kim

doi:10.1016/j.jbi.2017.06.007

Cited by 22 publications

(22 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They experimented with different classifiers and approaches. The classifiers most prominently used were Support Vector Regressors [22], Decision Trees [23], Random Forests [24] and Gradient Tree Boosting [25]. The approaches included an ensemble of Convolutional Neural Networks (CNN) with word embeddings [26] and a mixture of Regularized Multinomial Logistic Regression classifiers and Neural Networks [27].…”

Section: Track 2: Symptom Severity Classificationmentioning

confidence: 99%

See 1 more Smart Citation

A natural language processing challenge for clinical records: Research Domains Criteria (RDoC) for psychiatry

Uzuner

Stubbs

Filannino

2017

Journal of Biomedical Informatics

View full text Add to dashboard Cite

Section: Track 2: Symptom Severity Classificationmentioning

confidence: 99%

“…Due to the availability of unannotated data, in addition to their supervised solutions, four teams experimented with semi-supervised approaches, e.g., [22, 24]. Only three teams involved medical experts.…”

Section: Track 2: Symptom Severity Classificationmentioning

confidence: 99%

A natural language processing challenge for clinical records: Research Domains Criteria (RDoC) for psychiatry

Uzuner

Stubbs

Filannino

2017

Journal of Biomedical Informatics

View full text Add to dashboard Cite

“…The goal is to decrease the correlation between individual trees, which results in diminished variance when the trees are aggregated. Random forests accommodate sparsity [12], which is favorable in this case, due to a low percentage of patients who reached the primary outcome. The individual trees are designed to overfit on features (making very specific decisions that only account for part of the data set), whereas the voting strategy mitigates these effects by generalizing over the decisions of multiple trees.…”

Section: Introductionmentioning

confidence: 99%

Use of machine-learning algorithms to determine features of systolic blood pressure variability that predict poor outcomes in hypertensive patients

Lacson

Baker

Suresh

et al. 2018

Clinical Kidney Journal

View full text Add to dashboard Cite

Background We re-analyzed data from the Systolic Blood Pressure Intervention Trial (SPRINT) trial to identify features of systolic blood pressure (SBP) variability that portend poor cardiovascular outcomes using a nonlinear machine-learning algorithm. Methods We included all patients who completed 1 year of the study without reaching any primary endpoint during the first year, specifically: myocardial infarction, other acute coronary syndromes, stroke, heart failure or death from a cardiovascular event ( n = 8799; 94%). In addition to clinical variables, features representing longitudinal SBP trends and variability were determined and combined in a random forest algorithm, optimized using cross-validation, using 70% of patients in the training set. Area under the curve (AUC) was measured using a 30% testing set. Finally, feature importance was determined by minimizing node impurity averaging over all trees in the forest for a specific feature. Results A total of 365 patients (4.1%) reached the combined primary outcome over 37 months of follow-up. The random forest classifier had an AUC of 0.71 on the testing set. The 10 most significant features selected in order of importance by the automated algorithm included the urine albumin/creatinine (CR) ratio, estimated glomerular filtration rate, age, serum CR, history of subclinical cardiovascular disease (CVD), cholesterol, a variable representing SBP signals using wavelet transformation, high-density lipoprotein, the 90th percentile of SBP and triglyceride level. Conclusions We successfully demonstrated use of random forest algorithm to define best prognostic longitudinal SBP representations. In addition to known risk factors for CVD, transformed variables for time series SBP measurements were found to be important in predicting poor cardiovascular outcomes and require further evaluation.

show abstract

“…They predicted the labels by using a SVM classifier with re-weighted responses to reflect the unbalanced nature of the data. The Antwerp Universitys research team corrected misspellings and the erroneously concatenated words by using hand-written regular expressions and mapped the words to UMLS concepts by using fuzzy matching rules [24]. Instead of using the entire set of mapped concepts, they restricted it to the ones related to psychiatric diagnoses by means of the Diagnostic & Statistical Manual of Mental Disorders (DSM).…”

Section: Resultsmentioning

confidence: 99%

Symptom severity prediction from neuropsychiatric clinical records: Overview of 2016 CEGS N-GRID shared tasks Track 2

Filannino

Stubbs

Uzuner

2017

Journal of Biomedical Informatics

View full text Add to dashboard Cite

The second track of the CEGS N-GRID 2016 natural language processing shared tasks focused on predicting symptom severity from neuropsychiatric clinical records. For the first time, initial psychiatric evaluation records have been collected, de-identified, annotated and shared with the scientific community. One-hundred-ten researchers organized in twenty-four teams participated in this track and submitted sixty-five system runs for evaluation. The top ten teams each achieved an inverse normalized macro-averaged mean absolute error score over 0.80. The top performing system employed an ensemble of six different machine learning-based classifiers to achieve a score 0.86. The task resulted to be generally easy with the exception of two specific classes of records: records with very few but crucial positive valence signals, and records describing patients predominantly affected by negative rather than positive valence. Those cases proved to be very challenging for most of the systems. Further research is required to consider the task solved. Overall, the results of this track demonstrate the effectiveness of data-driven approaches to the task of symptom severity classification.

show abstract

Counting trees in Random Forests: Predicting symptom severity in psychiatric intake reports

Cited by 22 publications

References 19 publications

A natural language processing challenge for clinical records: Research Domains Criteria (RDoC) for psychiatry

A natural language processing challenge for clinical records: Research Domains Criteria (RDoC) for psychiatry

Use of machine-learning algorithms to determine features of systolic blood pressure variability that predict poor outcomes in hypertensive patients

Symptom severity prediction from neuropsychiatric clinical records: Overview of 2016 CEGS N-GRID shared tasks Track 2

Contact Info

Product

Resources

About