Accurate and timely pregnancy diagnosis is an important component of effective herd management in dairy cattle. Predicting pregnancy from Fouriertransform mid-infrared (FT-MIR) spectroscopy data is of particular interest because the data are often already available from routine milk testing. The purpose of this study was to evaluate how well pregnancy status could be predicted in a large data set of 1,161,436 FT-MIR milk spectra records from 863,982 mixed-breed pasturebased New Zealand dairy cattle managed within seasonal calving systems. Three strategies were assessed for defining the nonpregnant cows when partitioning the records according to pregnancy status in the training population. Two of these used records for cows with a subsequent calving only, whereas the third also included records for cows without a subsequent calving. For each partitioning strategy, partial least squares discriminant analysis models were developed, whereby spectra from all the cows in 80% of herds were used to train the models, and predictions on cows in the remaining herds were used for validation. A separate data set was also used as a secondary validation, whereby pregnancy diagnosis had been assigned according to the presence of pregnancy-associated glycoproteins (PAG) in the milk samples. We examined different ways of accounting for stage of lactation in the prediction models, either by including it as an effect in the prediction model, or by pre-adjusting spectra before fitting the model. For a subset of strategies, we also assessed prediction accuracies from deep learning approaches, utilizing either the raw spectra or images of spectra. Across all strategies, prediction accuracies were highest for models using the unadjusted spectra as model predictors. Strategies for cows with a subsequent calving performed well in herdindependent validation with sensitivities above 0.79, specificities above 0.91 and area under the receiver operating characteristic curve (AUC) values over 0.91. However, for these strategies, the specificity to predict nonpregnant cows in the external PAG data set was poor (0.002-0.04). The best performing models were those that included records for cows without a subsequent calving, and used unadjusted spectra and days in milk as predictors, with consistent results observed across the training, herd-independent validation and PAG data sets. For the partial least squares discriminant analysis model, sensitivity was 0.71, specificity was 0.54 and AUC values were 0.68 in the PAG data set; and for an image-based deep learning model, the sensitivity was 0.74, specificity was 0.52 and the AUC value was 0.69. Our results demonstrate that in pasture-based seasonal calving herds, confounding between pregnancy status and spectral changes associated with stage of lactation can inflate prediction accuracies. When the effect of this confounding was reduced, prediction accuracies were not sufficiently high enough to use as a sole indicator of pregnancy status.