Accessing information on an underlying network driving a biological process often involves interrupting the process and collecting snapshot data. When snapshot data are stochastic, the data’s structure necessitates a probabilistic description to infer underlying reaction networks. As an example, we may imagine wanting to learn gene state networks from the type of data collected in single molecule RNA fluorescence in situ hybridization (RNA-FISH). In the networks we consider, nodes represent network states, and edges represent biochemical reaction rates linking states. Simultaneously estimating the number of nodes and constituent parameters from snapshot data remains a challenging task in part on account of data uncertainty and timescale separations between kinetic parameters mediating the network. While parametric Bayesian methods learn parameters given a network structure (with known node numbers) with rigorously propagated measurement uncertainty, learning the number of nodes and parameters with potentially large timescale separations remain open questions. Here, we propose a Bayesian nonparametric framework and describe a hybrid Bayesian Markov Chain Monte Carlo (MCMC) sampler directly addressing these challenges. In particular, in our hybrid method, Hamiltonian Monte Carlo (HMC) leverages local posterior geometries in inference to explore the parameter space; Adaptive Metropolis Hastings (AMH) learns correlations between plausible parameter sets to efficiently propose probable models; and Parallel Tempering takes into account multiple models simultaneously with tempered information content to augment sampling efficiency. We apply our method to synthetic data mimicking single molecule RNA-FISH, a popular snapshot method in probing transcriptional networks to illustrate the identified challenges inherent to learning dynamical models from these snapshots and how our method addresses them.
Gene networks, key toward understanding a cell's regulatory response, underlie experimental observations of single cell transcriptional dynamics. While information on the gene network is encoded in RNA expression data, existing computational frameworks cannot currently infer gene networks from such data. Rather, gene networks--composed of gene states, their connectivities, and associated parameters--are currently deduced by pre-specifying gene state numbers and connectivity prior to learning associated rate parameters. As such, the correctness of gene networks cannot be independently assessed which can lead to strong biases. By contrast, here we propose a method to learn full distributions over gene states, state connectivities, and associated rate parameters, simultaneously and self-consistently from single molecule level RNA counts. Notably, our method propagates noise originating from fluctuating RNA counts over networks warranted by the data by treating networks themselves as random variables. We achieve this by operating within a Bayesian nonparametric paradigm. We demonstrate our method on the lacZ pathway in Escherichia coli cells, the STL1 pathway in Saccharomyces cerevisiae yeast cells, and verify its robustness on synthetic data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.