TO THE EDITOR:Recently, Winter and Hahn [1] commented on our work on identifying subtypes of major psychiatry disorders (MPDs) based on neurobiological features using machine learning [2]. They questioned the generalizability of our methods and the statistical significance, stability, and overfitting of the results, and proposed a pipeline for disease subtyping. We appreciate their earnest consideration of our work, however, we need to point out their misconceptions of basic machine-learning concepts and delineate some key issues involved.
Subtyping diseasesSubtyping diseases, such as MPDs in [2], is a task of clustering a set of unlabeled data (i.e., patients) with no definition of target subtypes and no known number of subtypes. Clustering is fundamentally different from classification where training data with class labels are provided.When no subtype/class label is available, most concepts and techniques developed for classification do not apply. However, Winter and Hahn proposed a clustering pipeline [1] consisting of components primarily applicable to classification, including generalization, statistical significance test, overfitting avoidance, and cross-validation.