Evaluating the state of the art in missing data imputation for clinical data

Luo, Yuan

doi:10.1093/bib/bbab489

Cited by 61 publications

(28 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Nevertheless, randomized controlled trials are needed to potentially overcome this bias and establish the model performance against the standard clinical parameters. In addition, imputation methods such as MICE have been used to address the missing data issue [ 209 ].…”

Section: Discussionmentioning

confidence: 99%

Current and Future Applications of Artificial Intelligence in Coronary Artery Disease

et al. 2022

View full text Add to dashboard Cite

Cardiovascular diseases (CVDs) carry significant morbidity and mortality and are associated with substantial economic burden on healthcare systems around the world. Coronary artery disease, as one disease entity under the CVDs umbrella, had a prevalence of 7.2% among adults in the United States and incurred a financial burden of 360 billion US dollars in the years 2016–2017. The introduction of artificial intelligence (AI) and machine learning over the last two decades has unlocked new dimensions in the field of cardiovascular medicine. From automatic interpretations of heart rhythm disorders via smartwatches, to assisting in complex decision-making, AI has quickly expanded its realms in medicine and has demonstrated itself as a promising tool in helping clinicians guide treatment decisions. Understanding complex genetic interactions and developing clinical risk prediction models, advanced cardiac imaging, and improving mortality outcomes are just a few areas where AI has been applied in the domain of coronary artery disease. Through this review, we sought to summarize the advances in AI relating to coronary artery disease, current limitations, and future perspectives.

show abstract

Section: Discussionmentioning

confidence: 99%

Current and Future Applications of Artificial Intelligence in Coronary Artery Disease

et al. 2022

View full text Add to dashboard Cite

show abstract

“…In this study, we have excluded patients with missing data and performed complete case analysis. In future study, we plan to apply advanced missing data imputation techniques [33][34][35] to relax this exclusion criteria and investigate the potential links between missing data and social determinants of health.…”

Section: Discussionmentioning

confidence: 99%

Comparison between machine learning methods for mortality prediction for sepsis patients with different social determinants

Wang

Naidech

et al. 2022

BMC Med Inform Decis Mak

Self Cite

View full text Add to dashboard Cite

Background Sepsis is one of the most life-threatening circumstances for critically ill patients in the United States, while diagnosis of sepsis is challenging as a standardized criteria for sepsis identification is still under development. Disparities in social determinants of sepsis patients can interfere with the risk prediction performances using machine learning. Methods We analyzed a cohort of critical care patients from the Medical Information Mart for Intensive Care (MIMIC)-III database. Disparities in social determinants, including race, sex, marital status, insurance types and languages, among patients identified by six available sepsis criteria were revealed by forest plots with 95% confidence intervals. Sepsis patients were then identified by the Sepsis-3 criteria. Sixteen machine learning classifiers were trained to predict in-hospital mortality for sepsis patients on a training set constructed by random selection. The performance was measured by area under the receiver operating characteristic curve (AUC). The performance of the trained model was tested on the entire randomly conducted test set and each sub-population built based on each of the following social determinants: race, sex, marital status, insurance type, and language. The fluctuations in performances were further examined by permutation tests. Results We analyzed a total of 11,791 critical care patients from the MIMIC-III database. Within the population identified by each sepsis identification method, significant differences were observed among sub-populations regarding race, marital status, insurance type, and language. On the 5783 sepsis patients identified by the Sepsis-3 criteria statistically significant performance decreases for mortality prediction were observed when applying the trained machine learning model on Asian and Hispanic patients, as well as the Spanish-speaking patients. With pairwise comparison, we detected performance discrepancies in mortality prediction between Asian and White patients, Asians and patients of other races, as well as English-speaking and Spanish-speaking patients. Conclusions Disparities in proportions of patients identified by various sepsis criteria were detected among the different social determinant groups. The performances of mortality prediction for sepsis patients can be compromised when applying a universally trained model for each subpopulation. To achieve accurate diagnosis, a versatile diagnostic system for sepsis is needed to overcome the social determinant disparities of patients.

show abstract

“…Therefore, it falsely predicted the patient as positive for lupus nephritis. future work, we plan to apply advanced imputation methods [29,30] to fill in missing laboratory tests in order to further improve the phenotyping performance.…”

Section: Error Analysismentioning

confidence: 99%

Natural Language Processing to Identify Cancer Treatments With Electronic Medical Records

Zeng¹,

Banerjee

Henry

et al. 2021

JCO Clinical Cancer Informatics

View full text Add to dashboard Cite

PURPOSE Knowing the treatments administered to patients with cancer is important for treatment planning and correlating treatment patterns with outcomes for personalized medicine study. However, existing methods to identify treatments are often lacking. We develop a natural language processing approach with structured electronic medical records and unstructured clinical notes to identify the initial treatment administered to patients with cancer. METHODS We used a total number of 4,412 patients with 483,782 clinical notes from the Stanford Cancer Institute Research Database containing patients with nonmetastatic prostate, oropharynx, and esophagus cancer. We trained treatment identification models for each cancer type separately and compared performance of using only structured, only unstructured ( bag-of-words, doc2vec, fasttext), and combinations of both ( structured + bow, structured + doc2vec, structured + fasttext). We optimized the identification model among five machine learning methods (logistic regression, multilayer perceptrons, random forest, support vector machines, and stochastic gradient boosting). The treatment information recorded in the cancer registry is the gold standard and compares our methods to an identification baseline with billing codes. RESULTS For prostate cancer, we achieved an f1-score of 0.99 (95% CI, 0.97 to 1.00) for radiation and 1.00 (95% CI, 0.99 to 1.00) for surgery using structured + doc2vec. For oropharynx cancer, we achieved an f1-score of 0.78 (95% CI, 0.58 to 0.93) for chemoradiation and 0.83 (95% CI, 0.69 to 0.95) for surgery using doc2vec. For esophagus cancer, we achieved an f1-score of 1.0 (95% CI, 1.0 to 1.0) for both chemoradiation and surgery using all combinations of structured and unstructured data. We found that employing the free-text clinical notes outperforms using the billing codes or only structured data for all three cancer types. CONCLUSION Our results show that treatment identification using free-text clinical notes greatly improves upon the performance using billing codes and simple structured data. The approach can be used for treatment cohort identification and adapted for longitudinal cancer treatment identification.

show abstract

Evaluating the state of the art in missing data imputation for clinical data

Cited by 61 publications

References 27 publications

Current and Future Applications of Artificial Intelligence in Coronary Artery Disease

Current and Future Applications of Artificial Intelligence in Coronary Artery Disease

Comparison between machine learning methods for mortality prediction for sepsis patients with different social determinants

Natural Language Processing to Identify Cancer Treatments With Electronic Medical Records

Contact Info

Product

Resources

About