Discovery of new perovskite materials is motivated by a broad range of materials applications and accelerated by recent advances in machine learning (ML). We herein report dataset augmentation, benchmarking, and interrogation for an ongoing experimental campaign consisting of 9483 halide perovskite synthesis experiments. To address limitations in previous work, we developed an improved description of the reactant concentrations in the experiments (validated against experimental observations) and performed experiments quantifying the excess volume of mixing of γ-butyrolactone/formic acid mixtures used in the perovskite syntheses. Combining this improved description of reactant concentration with other physicochemical features of the reactants, we constructed 1108 ML models to elucidate the roles of the algorithm (k-nearest neighbors, linear support-vector machine, and gradient boosted tree), feature set (12 in total), preprocessing regime (e.g., standardization), and training data holdout scheme on ML predictive ability. ML comparisons illustrated that the chemical accuracy of less sophisticated physical models in a dataset do not hinder interpolative model performance. Analysis of feature contributions showed how ML models "learn" competitive representations for concentration using raw experimental descriptions. Interrogation of the most performant models indicated that the numerical values of physicochemical features were not important, rather these features were being used to identify and interpolate within a particular reactant set. ML models were shown to be capable of making rudimentary extrapolations to untrained chemical systems when compared against basic benchmarks, and models which included the newly developed chemical features were shown to be more reliable than models trained without. These results illustrate how a stepwise comparative approach to machine learning can provide insight into what and how much models are "learning" for a given prediction task.