Numerical simulations of multiphase flows are crucial in numerous engineering applications, but are often limited by the computationally demanding solution of the Navier–Stokes (NS) equations. The development of surrogate models relies on involved algebra and several assumptions. Here, we present a data-driven workflow where a handful of detailed NS simulation data are leveraged into a reduced-order model for a prototypical vertically falling liquid film. We develop a physics-agnostic model for the film thickness, achieving a far better agreement with the NS solutions than the asymptotic Kuramoto–Sivashinsky (KS) equation. We also develop two variants of physics-infused models providing a form of calibration of a low-fidelity model (i.e. the KS) against a few high-fidelity NS data. Finally, predictive models for missing data are developed, for either the amplitude, or the full-field velocity and even the flow parameter from partial information. This is achieved with the so-called ‘gappy diffusion maps’, which we compare favourably to its linear counterpart, gappy POD.