Multiplex assays of variant effect (MAVEs) are a family of methods that includes deep mutational scanning experiments on proteins and massively parallel reporter assays on gene regulatory sequences. Despite their increasing popularity, a general strategy for inferring quantitative models of genotype-phenotype maps from MAVE data is lacking. Here we introduce MAVE-NN, a neural-network-based Python package that implements a broadly applicable information-theoretic framework for learning genotype-phenotype maps—including biophysically interpretable models—from MAVE datasets. We demonstrate MAVE-NN in multiple biological contexts, and highlight the ability of our approach to deconvolve mutational effects from otherwise confounding experimental nonlinearities and noise.
Complex spatiotemporal dynamics of physicochemical processes are often modeled at a microscopic level (through e.g. atomistic, agent-based or lattice models) based on first principles. Some of these processes can also be successfully modeled at the macroscopic level using e.g. partial differential equations (PDEs) describing the evolution of the right few macroscopic observables (e.g. concentration and momentum fields). Deriving good macroscopic descriptions (the so-called "closure problem") is often a time-consuming process requiring deep understanding/intuition about the system of interest. Recent developments in data science provide alternative ways to effectively extract/learn accurate macroscopic descriptions approximating the underlying microscopic observations. In this paper, we introduce a datadriven framework for the identification of unavailable coarse-scale PDEs from microscopic observations via machine learning algorithms. Specifically, using Gaussian Processes, Artificial Neural Networks, and/or Diffusion Maps, the proposed framework uncovers the relation between the relevant macroscopic space fields and their time evolution (the right-hand-side of the explicitly unavailable macroscopic PDE). Interestingly, several choices equally representative of the data can be discovered. The framework will be illustrated through the data-driven discovery of macroscopic, concentration-level PDEs resulting from a fine-scale, Lattice Boltzmann level model of a reaction/transport process. Once the coarse evolution law is identified, it can be simulated to produce long-term macroscopic predictions. Different features (pros as well as cons) of alternative machine learning algorithms for performing this task (Gaussian Processes and Artificial Neural Networks), are presented and discussed.
Large scale dynamical systems (e.g. many nonlinear coupled differential equations) can often be summarized in terms of only a few state variables (a few equations), a trait that reduces complexity and facilitates exploration of behavioral aspects of otherwise intractable models. High model dimensionality and complexity makes symbolic, pen-and-paper model reduction tedious and impractical, a difficulty addressed by recently developed frameworks that computerize reduction. Symbolic work has the benefit, however, of identifying both reduced state variables and parameter combinations that matter most (effective parameters, "inputs"); whereas current computational reduction schemes leave the parameter reduction aspect mostly unaddressed. As the interest in mapping out and optimizing complex input-output relations keeps growing, it becomes clear that combating the curse of dimensionality also requires efficient schemes for input space exploration and reduction. Here, we explore systematic, data-driven parameter reduction by means of effective parameter identification, starting from current nonlinear manifoldlearning techniques enabling state space reduction. Our approach aspires to extend the data-driven determination of effective state variables with the data-driven discovery of effective model parameters, and thus to accelerate the exploration of high-dimensional parameter spaces associated with com- * These two authors contributed equally to this work.Given access to input-output information (black-box function evaluation) but no formulas, one might not even suspect that only the single parameter combination p eff = p 1 p 2 matters. Fitting the model to data f * = (1, 0, 1) in the absence of such information, one would find an entire curve in parameter space that fits the observations. A data fitting algorithm based only on function evaluations could be "confused" by such behavior in declaring convergence. As seen in Fig. 1(a), different initial conditions fed to an optimizer with a practical fitting tolerance δ ≈ 10 −3 (see figure caption for details) converge to many, widely different results tracing a level curve of p eff . The subset of good fits is effectively 1−D; more importantly, and moving beyond the fit to this particular data, the entire parameter space is foliated by such 1−D curves (neutral sets), each composed of points indistinguishable from the model output perspective. Parameter non-identifiability is therefore a structural feature of the model, not an artifact of optimization. The appropriate, intrinsic way to describe parameter space for this problem is through the effective parameter p eff and its level sets. Consider now the inset of Fig. 1(a), corresponding to the perturbed model f ε (p 1 , p 2 ) = f 0 (p 1 , p 2 ) + 2ε(p 1 − p 2 , 0, 0) and fit to the same data. Here, the parameters are identifiable and the minimizer (p 1 , p 2 ) unique: a perfect fit exists. However, the foliation observed for ε = 0 is loosely remembered in the shape of the residual level curves, and the optimizer would be co...
ZnO deposition in porous γ‐Al2O3 via atomic layer deposition (ALD) is the critical first step for the fabrication of zeolitic imidazolate framework membranes using the ligand‐induced perm‐selectivation process (Science, 361 (2018), 1008–1011). A detailed computational fluid dynamics (CFD) model of the ALD reactor is developed using a finite‐volume‐based code and validated. It accounts for the transport processes within the feeding system and reaction chamber. The simulated precursor spatiotemporal profiles assuming no ALD reaction were used as boundary conditions in modeling diethylzinc reaction/diffusion in porous γ‐Al2O3, the predictions of which agreed with experimental electron microscopy measurements. Further simulations confirmed that the present deposition flux is much less than the upper limit of flux, below which the decoupling of reactor/substrate is an accurate assumption. The modeling approach demonstrated here allows for the design of ALD processes for thin‐film membrane formation including the synthesis of metal–organic framework membranes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.