Nonlinear manifold learning algorithms, such as diffusion maps, have been fruitfully applied in recent years to the analysis of large and complex data sets. However, such algorithms still encounter challenges when faced with real data. One such challenge is the existence of "repeated eigendirections," which obscures the detection of the true dimensionality of the underlying manifold and arises when several embedding coordinates parametrize the same direction in the intrinsic geometry of the data set. We propose an algorithm, based on local linear regression, to automatically detect coordinates corresponding to repeated eigendirections. We construct a more parsimonious embedding using only the eigenvectors corresponding to unique eigendirections, and we show that this reduced diffusion maps embedding induces a metric which is equivalent to the standard diffusion distance. We first demonstrate the utility and flexibility of our approach on synthetic data sets. We then apply our algorithm to data collected from a stochastic model of cellular chemotaxis, where our approach for factoring out repeated eigendirections allows us to detect changes in dynamical behavior and the underlying intrinsic system dimensionality directly from data.
Understanding the mechanisms by which proteins fold from disordered amino-acid chains to spatially ordered structures remains an area of active inquiry. Molecular simulations can provide atomistic details of the folding dynamics which complement experimental findings. Conventional order parameters, such as root-mean-square deviation and radius of gyration, provide structural information but fail to capture the underlying dynamics of the protein folding process. It is therefore advantageous to adopt a method that can systematically analyze simulation data to extract relevant structural as well as dynamical information. The nonlinear dimensionality reduction technique known as diffusion maps automatically embeds the high-dimensional folding trajectories in a lower-dimensional space from which one can more easily visualize folding pathways, assuming the data lie approximately on a lower-dimensional manifold. The eigenvectors that parametrize the low-dimensional space, furthermore, are determined systematically, rather than chosen heuristically, as is done with phenomenological order parameters. We demonstrate that diffusion maps can effectively characterize the folding process of a Trp-cage miniprotein. By embedding molecular dynamics simulation trajectories of Trp-cage folding in diffusion maps space, we identify two folding pathways and intermediate structures that are consistent with the previous studies, demonstrating that this technique can be employed as an effective way of analyzing and constructing protein folding pathways from molecular simulations.
Summary Transient activation of the highly conserved extracellular signal regulated kinase (ERK) establishes precise patterns of cell fates in developing tissues. Quantitative parameters of these transients are essentially unknown, but a growing number of studies suggest that changes in these parameters can lead to a broad spectrum of developmental abnormalities. We provide a detailed quantitative picture of an ERK-dependent inductive signaling event in the early Drosophila embryo, an experimental system that offers unique opportunities for high-throughput studies of developmental signaling. Our analysis reveals a spatiotemporal pulse of ERK activation that is consistent with a model in which transient production of a short-ranged ligand feeds into a simple signal interpretation system. The pulse of ERK signaling acts as a switch in controlling the expression of the ERK-target gene. The quantitative approach that led to this model, based on the integration of data from fixed embryos and live imaging, can be extended to other developmental systems patterned by transient inductive signals.
The adoption of detailed mechanisms for chemical kinetics often poses two 1 types of severe challenges: First, the number of degrees of freedom is large; and second, 2 the dynamics is characterized by widely disparate time scales. As a result, reactive flow 3 solvers with detailed chemistry often become intractable even for large clusters of CPUs, 4 especially when dealing with direct numerical simulation (DNS) of turbulent combustion 5 problems. This has motivated the development of several techniques for reducing the 6 complexity of such kinetics models, where eventually only a few variables are considered 7 in the development of the simplified model. Unfortunately, no generally applicable a priori 8 recipe for selecting suitable parameterizations of the reduced model is available, and the 9 choice of slow variables often relies upon intuition and experience. We present an automated 10 approach to this task, consisting of three main steps. First, the low dimensional manifold 11 of slow motions is (approximately) sampled by brief simulations of the detailed model, 12 starting from a rich enough ensemble of admissible initial conditions. Second, a global 13 parametrization of the manifold is obtained through the Diffusion Map (DMAP) approach, 14 which has recently emerged as a powerful tool in data analysis/machine learning. Finally, a 15 simplified model is constructed and solved on the fly in terms of the above reduced (slow) 16 variables. Clearly, closing this latter model requires nontrivial interpolation calculations, 17 enabling restriction (mapping from the full ambient space to the reduced one) and lifting 18 (mapping from the reduced space to the ambient one). This is a key step in our approach, 19 and a variety of interpolation schemes are reported and compared. The scope of the proposed 20 procedure is presented and discussed by means of an illustrative combustion example. 21 arXiv:1307.6849v1 [math.DS] 22 29the development of a plethora of approaches aiming at reducing the computational complexity of such 30 detailed combustion models, ideally by recasting them in terms of only a few new reduced variables. 31(see e.g.[1] and references therein). The implementation of many of these techniques typically involves 32 three successive steps. First, a large set of stiff ordinary differential equations (ODEs) is considered 33 for modeling the temporal evolution of a spatially homogenous mixture of chemical species under 34 specified stoichiometric and thermodynamic conditions (usually fixed total enthalpy and pressure for 35 combustion in the low Mach regime). It is well known that, due to the presence of fast and slow 36 dynamics, the above systems are characterized by low dimensional manifolds in the concentration 37 space (or phase-space), where a typical solution trajectory is initially rapidly attracted towards the 38 manifold, while afterwards it proceeds to the thermodynamic equilibrium point always remaining in 39 close proximity to the manifold. Clearly, the presence of a manifold for...
Multiple time scale stochastic dynamical systems are ubiquitous in science and engineering, and the reduction of such systems and their models to only their slow components is often essential for scientific computation and further analysis. Rather than being available in the form of an explicit analytical model, often such systems can only be observed as a data set which exhibits dynamics on several time scales. We will focus on applying and adapting data mining and manifold learning techniques to detect the slow components in such multiscale data. Traditional data mining methods are based on metrics (and thus, geometries) which are not informed of the multiscale nature of the underlying system dynamics; such methods cannot successfully recover the slow variables. Here, we present an approach which utilizes both the local geometry and the local dynamics within the data set through a metric which is both insensitive to the fast variables and more general than simple statistical averaging. Our analysis of the approach provides conditions for successfully recovering the underlying slow variables, as well as an empirical protocol guiding the selection of the method parameters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.