Evolutionary program ensembles are developed and tested for minimum temperature forecasts at Chicago, Illinois, at forecast ranges of 36, 60, 84, 108, 132, and 156 h. For all forecast ranges examined, the evolutionary program ensemble outperforms the 21-member GFS model output statistics (MOS) ensemble when considering root-mean-square error and Brier skill score. The relative advantage in root-mean-square error widens with forecast range, from 0.188F at 36 h to 1.538F at 156 h while the probabilistic skill remains positive throughout. At all forecast ranges, probabilistic forecasts of abnormal conditions are particularly skillful compared to the raw GFS guidance.The evolutionary program reliance on particular forecast inputs is distinct from that obtained from considering multiple linear regression models, with less reliance on the GFS MOS temperature and more on alternative data such as upstream temperatures at the time of forecast issuance, time of year, and forecasts of wind speed, precipitation, and cloud cover. This weighting trends away from current observations and toward seasonal (climatological) measures as forecast range increases.Using two different forms of ensemble member subselection, a Bayesian model combination calibration is tested on both ensembles. This calibration had limited effect on evolutionary program ensemble skill but was able to improve MOS ensemble performance, reducing but not eliminating the skill gap between them. The largest skill differentials occurred at the longest forecast ranges, beginning at 132 h. A hybrid, calibrated ensemble was able to provide some further increase in skill.