We evaluated the efficacy of large language models (LLMs), specifically, generative pre‐trained transformer‐4 (GPT‐4), in predicting pregnancy following in vitro fertilization (IVF) treatment and compared its accuracy with results from an original published study. Our findings revealed that GPT‐4 can autonomously develop and refine advanced machine learning models for pregnancy prediction with minimal human intervention. The prediction accuracy was 0.79, and the area under the receiver operating characteristic curve (AUROC) was 0.89, exceeding or being at least equivalent to the metrics reported in the original study, that is, 0.78 for accuracy and 0.87 for AUROC. The results suggest that LLMs can facilitate data processing, optimize machine learning models in predicting IVF success rates, and provide data interpretation methods. This capacity can help bridge the knowledge gap between data scientists and medical personnel to solve the most pressing clinical challenges. However, more experiments on diverse and larger datasets are needed to validate and promote broader applications of LLMs in assisted reproduction.