Infrared (IR) spectra of adsorbate vibrational modes are sensitive to adsorbate/metal interactions, accurate, and easily obtainable in-situ or operando. While they are the gold standards for characterizing single-crystals and large nanoparticles, analogous spectra for highly dispersed heterogeneous catalysts consisting of single-atoms and ultra-small clusters are lacking. Here, we combine data-based approaches with physics-driven surrogate models to generate synthetic IR spectra from first-principles. We bypass the vast combinatorial space of clusters by determining viable, low-energy structures using machine-learned Hamiltonians, genetic algorithm optimization, and grand canonical Monte Carlo calculations. We obtain first-principles vibrations on this tractable ensemble and generate single-cluster primary spectra analogous to pure component gas-phase IR spectra. With such spectra as standards, we predict cluster size distributions from computational and experimental data, demonstrated in the case of CO adsorption on Pd/CeO2(111) catalysts, and quantify uncertainty using Bayesian Inference. We discuss extensions for characterizing complex materials towards closing the materials gap.