A method is presented for flexibly aligning small molecules. The method accepts a collection of small molecules with 3D coordinates as input and computes a collection of alignments. Each alignment is given a score, which quantifies the quality of the alignment both in terms of internal strain and overlap of molecular features. The results of several computational experiments on pairs of compounds with known binding conformations are used to systematically and objectively tune the parameters for the method. The results indicate the method's utility for the elucidation of pharmacophores and comparative field analysis.
In this paper, the relationships between the eigenvalues of the Gram matrix for a kernel () corresponding to a sample 1. .. drawn from a density () and the eigenvalues of the corresponding continuous eigenproblem is analyzed. The differences between the two spectra are bounded and a performance bound on kernel principal component analysis (PCA) is provided showing that good performance can be expected even in very-high-dimensional feature spaces provided the sample eigenvalues fall sufficiently quickly.
Abstract:Though it is widely recognized that data sharing enables faster scientific progress, the sensible need to protect participant privacy hampers this practice in medicine. We train deep neural networks that generate synthetic participants closely resembling study participants. Using the SPRINT trial as an example, we show that machine-learning models built from simulated participants generalize to the original dataset. We incorporate differential privacy, which offers strong guarantees on the likelihood that a participant could be identified as a member of the trial. Investigators who have compiled a dataset can use our method to provide a freely accessible public version that enables other scientists to perform discovery-oriented analyses. Generated data can be released alongside analytical code to enable fully reproducible workflows, even when privacy is a concern. By addressing data sharing challenges, deep neural networks can facilitate the rigorous and reproducible investigation of clinical datasets.
IntroductionSharing individual-level data from clinical studies remains challenging. The status quo often requires scientists to establish a formal collaboration and execute extensive data usage agreements before sharing data. These requirements slow or even prevent data sharing between researchers in all but the closest collaborations.
Background:
Data sharing accelerates scientific progress but sharing individual-level data while preserving patient privacy presents a barrier.
Methods and Results:
Using pairs of deep neural networks, we generated simulated, synthetic participants that closely resemble participants of the SPRINT trial (Systolic Blood Pressure Trial). We showed that such paired networks can be trained with differential privacy, a formal privacy framework that limits the likelihood that queries of the synthetic participants’ data could identify a real a participant in the trial. Machine learning predictors built on the synthetic population generalize to the original data set. This finding suggests that the synthetic data can be shared with others, enabling them to perform hypothesis-generating analyses as though they had the original trial data.
Conclusions:
Deep neural networks that generate synthetic participants facilitate secondary analyses and reproducible investigation of clinical data sets by enhancing data sharing while preserving participant privacy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.