“…variables) that ML can better utilize to represent the problem/target outcome, (2) feature selection, applying expert domain knowledge, statistical methods, and/or ML methods to remove 'irrelevant' features from consideration and improve downstream modeling, (3) data harmonization, allowing for the integration of data collected at different sites/institutions, (4) handling different outcomes and related challenges, e.g. binary classification, multi-class, quantitative phenotypes, class imbalance, temporal data, multi-labeled data, censored data, and the use of appropriate evaluation metrics, (5) ML algorithm selection for a given problem can be a challenge in itself, thus strategies to integrate the predictions of multiple machine learners as an ensemble are likely to be important, (6) ML modeling pipeline assembly, including critical considerations such as hyper-parameter optimization, accounting for overfitting, and clinical interpretability of trained models, and (7) considering and accounting for covariates as well as sources of bias in data collection, study design, and application of ML tools in order to avoid drawing conclusions based on spurious correlations.…”