Background
The application of data-driven methods is expected to play an increasingly important role in healthcare. However, a lack of personnel with the necessary skills to develop these models and interpret its output is preventing a wider adoption of these methods. To address this gap, we introduce and describe ORIENTATE, a software for automated application of machine learning classification algorithms by clinical practitioners lacking specific technical skills. ORIENTATE allows the selection of features and the target variable, then automatically generates a number of classification models and cross-validates them, finding the best model and evaluating it. It also implements a custom feature selection algorithm for systematic searches of the best combination of predictors for a given target variable. Finally, it outputs a comprehensive report with graphs that facilitates the explanation of the classification model results, using global interpretation methods, and an interface for the prediction of new input samples. Feature relevance and interaction plots provided by ORIENTATE allow to use it for statistical inference, which can replace and/or complement classical statistical studies.
Results
Its application to a dataset with healthy and special health care needs (SHCN) children, treated under deep sedation, was discussed as case study. On the example dataset, despite its small size, the feature selection algorithm found a set of features able to predict the need for a second sedation with a f1 score of 0.83 and a ROC (AUC) of 0.92. Eight predictive factors for both populations were found and ordered by the relevance assigned to them by the model. A discussion of how to derive inferences from the relevance and interaction plots and a comparison with a classical study is also provided.
Conclusions
ORIENTATE automatically finds suitable features and generates accurate classifiers which can be used in preventive tasks. In addition, researchers without specific skills on data methods can use it for the application of machine learning classification and as a complement to classical studies for inferential analysis of features. In the case study, a high prediction accuracy for a second sedation in SHCN children was achieved. The analysis of the relevance of the features showed that the number of teeth with pulpar treatments at the first sedation is a predictive factor for a second sedation.