We present a deep convolutional neural network for breast cancer screening exam classification, trained and evaluated on over 200,000 exams (over 1,000,000 images). Our network achieves an AUC of 0.895 in predicting whether there is a cancer in the breast, when tested on the screening population. We attribute the high accuracy of our model to a two-stage training procedure, which allows us to use a very high-capacity patch-level network to learn from pixel-level labels alongside a network learning from macroscopic breast-level labels. To validate our model, we conducted a reader study with 14 readers, each reading 720 screening mammogram exams, and find our model to be as accurate as experienced radiologists when presented with the same data. Finally, we show that a hybrid model, averaging probability of malignancy predicted by a radiologist with a prediction of our neural network, is more accurate than either of the two separately. To better understand our results, we conduct a thorough analysis of our network's performance on different subpopulations of the screening population, model design, training procedure, errors, and properties of its internal representations.deep learning | deep convolutional neural networks | breast cancer screening | mammography B reast cancer is the second leading cancer-related cause of death among women in the US. In 2014, over 39 million screening and diagnostic mammography exams were performed in the US. It is estimated that in 2015 232,000 women were diagnosed with breast cancer and approximately 40,000 died from it (1). Although mammography is the only imaging test that has reduced breast cancer mortality (2-4), there has been discussion regarding the potential harms of screening, including false positive recalls and associated false positive biopsies. The vast majority of the 10-15% of women asked to return following an inconclusive screening mammogram undergo another mammogram and/or ultrasound for clarification. After the additional imaging exams, many of these findings are determined as benign and only 10-20% are recommended to undergo a needle biopsy for further work-up. Among these, only 20-40% yield a diagnosis of cancer (5). Evidently, there is an unmet need to shift the balance of routine breast cancer screening towards more benefit and less harm.Traditional computer-aided detection (CAD) in mammography is routinely used by radiologists to assist with image interpretation, despite multicenter studies showing these CAD programs do not improve their diagnostic performance (6).These CAD programs typically use handcrafted features to mark sites on a mammogram that appear distinct from normal tissue structures. The radiologist decides whether to recall these findings, determining clinical significance and actionability. Recent developments in deep learning (7)-in particular, deep convolutional neural networks (CNNs) (8-12)-open possibilities for creating a new generation of CAD-like tools.This paper makes several contributions. Primarily, we train and evaluate a set of stro...