AimsRisk stratification of atypical ductal hyperplasia (ADH) and ductal carcinoma in situ (DCIS), diagnosed using breast biopsy, has great clinical significance. Clinical trials are currently exploring the possibility of active surveillance for low‐risk lesions, whereas axillary lymph node staging may be considered during surgical planning for high‐risk lesions. We aimed to develop a machine‐learning algorithm based on whole‐slide images of breast biopsy specimens and clinical information to predict the risk of upstaging to invasive breast cancer after wide excision.Methods and ResultsPatients diagnosed with ADH/DCIS on breast biopsy were included in this study, comprising 592 (740 slides) and 141 (198 slides) patients in the development and independent testing cohorts, respectively. Histological grading of the lesions was independently evaluated by two pathologists. Clinical information, including biopsy method, lesion size, and Breast Imaging Reporting and Data System (BI‐RADS) classification of ultrasound and mammograms, were collected. Deep DCIS consisted of three deep neural networks to evaluate nuclear grade, necrosis, and stromal reactivity. Deep DCIS output comprised five parameters: total patches, lesion extent, Deep Grade, Deep Necrosis, and Deep Stroma. Deep DCIS highly correlated with the pathologists' evaluations of both slide‐ and patient‐level labels. All five parameters of Deep DCIS were significantly associated with upstaging to invasive carcinoma in subsequent wide excisional specimens. Using multivariate logistic regression, Deep DCIS predicted upstaging to invasive carcinoma with an area under the curve (AUC) of 0.81, outperforming pathologists' evaluation (AUC, 0.71 and 0.69). After including clinical and hormone receptor status information, performance further improved (AUC, 0.87). This combined model retained its predictive power in two subgroup analyses: the first subgroup included unequivocal DCIS (excluding cases of ADH and DCIS suspicious for microinvasion) (AUC, 0.83), while the second excluded cases of high‐grade DCIS (AUC, 0.81). The model was validated in an independent testing cohort (AUC, 0.81).ConclusionThis study demonstrated that deep‐learning models can refine histological evaluation of ADH and DCIS on breast biopsies, which may help guide future treatment planning.