In many contexts it is extremely costly to perform enough high quality experimental measurements to accurately parameterize a predictive quantitative model. However, it is often much easier to carry out large numbers of experiments that indicate whether each sample is above or below a given threshold. Can many such categorical or "coarse" measurements be combined with a much smaller number of high resolution or "fine" measurements to yield accurate models? Here, we demonstrate an intuitive strategy, inspired by statistical physics, wherein the coarse measurements are used to identify the salient features of the data, while the fine measurements determine the relative importance of these features. A linear model is inferred from the fine measurements, augmented by a quadratic term that captures the correlation structure of the coarse data. We illustrate our strategy by considering the problems of predicting the antimalarial potency and aqueous solubility of small organic molecules from their 2D molecular structure.A large class of scientific questions asks whether dependent variables can be accurately predicted by using training data to learn the parameters of quantitative models. Classical statistics shows that this is possible if sufficiently many high resolution measurements are available, though the cost of performing these experiments can be prohibitive. On the other hand, in many settings, it can be straightforward to evaluate whether a measurement is above or below a certain threshold, raising the question of how such measurements can be incorporated into the modelling framework.Examples abound in disparate fields. For instance, predicting the solubility of organic molecules is a fundamental challenge in physical chemistry [1]. Although accurate measurements are extremely difficult to obtain [2], determining whether a molecule is soluble at a particular concentration is comparatively simple. Similarly, in drug discovery, biochemical assays that determine whether a molecule binds to a given receptor are much simpler than measuring protein-ligand binding affinity [3]. In protein biophysics, a key challenge is to predict the effect of amino acid changes on protein phenotype. Here, threshold measurements are naturally provided by homologous sequences from the same protein family [4][5][6][7][8]. In contrast, experimentally measuring the phenotypic change is much more difficult. A related problem is to predict the viral fitness landscape given HIV sequences obtained from patients; again collecting patient samples is much easier than measuring fitness directly [9,10]. In singlecell RNA sequencing, decomposition methods that extract the correlation structure of shallow gene expression measurements is an ongoing challenge [11,12].Despite the ubiquity of this problem, to our knowledge there is no principled method for combining numerous binary/categorical ("coarse") measurements with fewer quantitative ("fine") measurements to produce a predictive model. Although regression approaches can account for a prior estima...