ObjectiveIdentifying children at high risk of developing obesity can offer a critical time to change the course of the disease before it establishes. Numerous studies have tried to achieve this; but practical limitations remain, including (i) relying on data not present in routinely available pediatric data (like prenatal data), (ii) focusing on a single age prediction (hence, not tested across ages), and (iii) not achieving good results or adequately validating those.MethodsA customized sequential deep learning model was built to predict the risk of childhood obesity, focusing especially on capturing the temporal patterns. The model was trained only on routinely collected EHRs, containing a list of features identified by a group of clinical experts, and sourced from 36,191 diverse children aged 0 to 10. The model was evaluated using extensive discrimination, calibration, and utility analysis; and was validated temporally, geographically, and across various subgroups.ResultsOur results are mostly better (and never worse) than all previous studies, including those that focus on single-age predictions or link EHRs to external data. Specifically, the model consistently achieved an area under the curve (AUROC) of above 0.8 (with most cases around 0.9) for predicting obesity within the next 3 years for children 2 to 7. The validation results show the robustness of the model. Furthermore, the most influential predictors of the model match important risk factors of obesity.ConclusionsOur model is able to predict the risk of obesity for young children using only routinely collected EHR data, greatly facilitating its integration with the periodicity schedule. The model can serve as an objective screening tool to inform prevention efforts, especially by helping with very delicate interactions between providers and families in primary care settings.