The massive scale of concrete construction constrains the raw materials’ feedstocks that can be considered – requiring both universal abundance but also economical and energy-efficient processing. While significant improvements– from more efficient cement and concrete production to increased service life – have been realized over the past decades through traditional research paradigms, non-incremental innovations are necessary now to meet increasingly urgent needs, at a time when innovations in materials create even greater complexity. Data science is revolutionizing the rate of discovery and accelerating the rate of innovation for material systems. This review addresses machine learning and other data analytical techniques which utilize various forms of variable representation for cementitious systems. These techniques include those guided by physicochemical and cheminformatics approaches to chemical admixture design, use of materials informatics to develop process-structure-property linkages for quantifying increased service life, and change-point detection for assessing pozzolanicity in candidate supplementary cementitious materials (SCMs). These latent variables, coupled with approaches to dimensionality reduction driven both algorithmically as well as through domain knowledge, provide robust feature representation for cement-based materials and allow for more accurate models and greater generalization capability, resulting in a powerful design tool for infrastructure materials.