Abstract. We report the development of an algorithm for the retrieval of Total Column Water Vapor (TCWV) from blue spectra obtained by satellite instruments such as the Ozone Monitoring Instrument (OMI). The algorithm is implemented in an automatic processing pipeline and will be used to generate a long-term data record as part of a MEaSUREs project. TCWV is calculated as the ratio between the Slant Column Density (SCD) and Air Mass Factor (AMF). Both these factors are improved upon previous work by incorporating more constraints or physical processes. For the SCD, we have optimized the retrieval window to 432–466 nm, performed a temperature correction, and employed a new stripe-removal post-processing routine. The use of OMI Collection 4 spectra reduces the fitting uncertainty by ~9 % with respect to Collection 3. For the AMF, we perform on-line radiative transfer using VLIDORT. Over land surfaces, we use bi-directional reflectances based on MODIS products. Over the oceans, we consider surface roughness and water-leaving radiance, and we find that water-leaving radiance is important for avoiding large TCWV biases over the oceans. Under relatively clear conditions, the MEaSUREs data are well correlated with the reference datasets, having correlation coefficients of r ~0.9. Over the oceans, MEaSUREs-AMSR_E has an overall mean (median) of ~ 1 mm (0.6 mm) with a standard deviation of σ ~6.5 mm, though large systematic differences in certain regions are also found. Over land surfaces, MEaSUREs-GPS has an overall mean (median) of -0.7 mm (-0.8 mm) with σ ~5.7 mm. Even a small amount of cloud can introduce large bias and scatter; thus, without further correction, strict data filtering criteria are required. However, the MEaSUREs TCWV data can be corrected through machine learning. In this regard, under all-sky conditions, the mean bias of MEaSUREs reduces from 4.5 mm (without correction) to -0.3 mm (with correction using LightGBM models), and the standard deviation decreases from 11.8 mm to 3.8 mm. We also examined the representation error of the GPS stations using the dense GEONET data. The within-pixel variance of TCWV varies with grid size following a power law dependence. At 0.25°×0.25° resolution, the derived representation error is about 1.4 mm.