High-throughput experimentation has revolutionized data-driven experimental sciences and opened the door to the application of machine learning techniques. Nevertheless, the quality of any data analysis strongly depends on the quality of the data and specifically the degree to which random effects in the experimental data-generating process are quantified and accounted for. Accordingly calibration, i.e. the quantitative association between observed quantities and measurement responses, is a core element of many workflows in experimental sciences.
Particularly in life sciences, univariate calibration, often involving non-linear saturation effects, must be performed to extract quantitative information from measured data. At the same time, the estimation of uncertainty is inseparably connected to quantitative experimentation. Adequate calibration models that describe not only the input/output relationship in a measurement system but also its inherent measurement noise are required. Due to its mathematical nature, statistically robust calibration modeling remains a challenge for many practitioners, at the same time being extremely beneficial for machine learning applications.
In this work, we present a bottom-up conceptual and computational approach that solves many problems of understanding and implementing non-linear, empirical calibration modeling for quantification of analytes and process modeling. The methodology is first applied to the optical measurement of biomass concentrations in a high-throughput cultivation system, then to the quantification of glucose by an automated enzymatic assay. We implemented the conceptual framework in two Python packages, calibr8 and murefi, with which we demonstrate how to make uncertainty quantification for various calibration tasks more accessible. Our software packages enable more reproducible and automatable data analysis routines compared to commonly observed workflows in life sciences.
Subsequently, we combine the previously established calibration models with a hierarchical Monod-like ordinary differential equation model of microbial growth to describe multiple replicates of Corynebacterium glutamicum batch cultures. Key process model parameters are learned by both maximum likelihood estimation and Bayesian inference, highlighting the flexibility of the statistical and computational framework.