Background
Precision medicine requires a stratification of patients by disease presentation that is sufficiently informative to allow for selecting treatments on a per-patient basis. For many diseases, such as neurological disorders, this stratification problem translates into a complex problem of clustering multivariate and relatively short time series because (i) these diseases are multifactorial and not well described by single clinical outcome variables and (ii) disease progression needs to be monitored over time. Additionally, clinical data often additionally are hindered by the presence of many missing values, further complicating any clustering attempts.
Findings
The problem of clustering multivariate short time series with many missing values is generally not well addressed in the literature. In this work, we propose a deep learning–based method to address this issue, variational deep embedding with recurrence (VaDER). VaDER relies on a Gaussian mixture variational autoencoder framework, which is further extended to (i) model multivariate time series and (ii) directly deal with missing values. We validated VaDER by accurately recovering clusters from simulated and benchmark data with known ground truth clustering, while varying the degree of missingness. We then used VaDER to successfully stratify patients with Alzheimer disease and patients with Parkinson disease into subgroups characterized by clinically divergent disease progression profiles. Additional analyses demonstrated that these clinical differences reflected known underlying aspects of Alzheimer disease and Parkinson disease.
Conclusions
We believe our results show that VaDER can be of great value for future efforts in patient stratification, and multivariate time-series clustering in general.
Kernel principal component analysis (KPCA) has been widely used in nonlinear process monitoring since it can capture the nonlinear process characteristics. However, it suffers from high computational complexity and poor scalability while dealing with real-time process monitoring and large-scale process monitoring. In this paper, a novel dimension reduction technique, local and global randomized principal component analysis (LGRPCA), is proposed for nonlinear process monitoring. The proposed LGRPCA method first maps the input space onto a feature space to reveal nonlinear patterns through random Fourier features. With the aid of random Fourier features, the proposed LGRPCA method is scalable and with much lower computational and storage costs. To exploit the underlying local and global structure information in the feature space, local structure analysis is integrated into the framework of global variance information extraction. The resulting LGRPCA can provide an improved representation of input data than the traditional KPCA. Thus, the proposed LGRPCA method is quite suitable for real-time process monitoring and large-scale process monitoring. T 2 and squared prediction error (SPE) statistic control charts are built for fault detection using the proposed LGRPCA method. Furthermore, contribution plots to LGRPCA-based T 2 and SPE (Q) statistics are established to identify the root cause variables through a sensitivity analysis principle. The superior performance of the proposed LGRPCA-based nonlinear process monitoring method is demonstrated through a numerical example and the comparative study of the Tennessee Eastman benchmark process. INDEX TERMS Principal component analysis, random Fourier features, local and global structure analysis, fault detection, fault identification.
One of the visions of precision medicine has been to re-define disease taxonomies based on molecular characteristics rather than on phenotypic evidence. However, achieving this goal is highly challenging, specifically in neurology. Our contribution is a machine-learning based joint molecular subtyping of Alzheimer’s (AD) and Parkinson’s Disease (PD), based on the genetic burden of 15 molecular mechanisms comprising 27 proteins (e.g. APOE) that have been described in both diseases. We demonstrate that our joint AD/PD clustering using a combination of sparse autoencoders and sparse non-negative matrix factorization is reproducible and can be associated with significant differences of AD and PD patient subgroups on a clinical, pathophysiological and molecular level. Hence, clusters are disease-associated. To our knowledge this work is the first demonstration of a mechanism based stratification in the field of neurodegenerative diseases. Overall, we thus see this work as an important step towards a molecular mechanism-based taxonomy of neurological disorders, which could help in developing better targeted therapies in the future by going beyond classical phenotype based disease definitions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.