Background
Several breast cancer risk-assessment models exist. Few studies have evaluated predictive accuracy of multiple models in large screening populations.
Methods
We evaluated the performance of the BRCAPRO, Gail, Claus, Breast Cancer Surveillance Consortium (BCSC), and Tyrer-Cuzick models in predicting risk of breast cancer over 6 years among 35 921 women aged 40–84 years who underwent mammography screening at Newton-Wellesley Hospital from 2007 to 2009. We assessed model discrimination using the area under the receiver operating characteristic curve (AUC) and assessed calibration by comparing the ratio of observed-to-expected (O/E) cases. We calculated the square root of the Brier score and positive and negative predictive values of each model.
Results
Our results confirmed the good calibration and comparable moderate discrimination of the BRCAPRO, Gail, Tyrer-Cuzick, and BCSC models. The Gail model had slightly better O/E ratio and AUC (O/E = 0.98, 95% confidence interval [CI] = 0.91 to 1.06, AUC = 0.64, 95% CI = 0.61 to 0.65) compared with BRCAPRO (O/E = 0.94, 95% CI = 0.88 to 1.02, AUC = 0.61, 95% CI = 0.59 to 0.63) and Tyrer-Cuzick (version 8, O/E = 0.84, 95% CI = 0.79 to 0.91, AUC = 0.62, 95% 0.60 to 0.64) in the full study population, and the BCSC model had the highest AUC among women with available breast density information (O/E = 0.97, 95% CI = 0.89 to 1.05, AUC = 0.64, 95% CI = 0.62 to 0.66). All models had poorer predictive accuracy for human epidermal growth factor receptor 2 positive and triple-negative breast cancers than hormone receptor positive human epidermal growth factor receptor 2 negative breast cancers.
Conclusions
In a large cohort of patients undergoing mammography screening, existing risk prediction models had similar, moderate predictive accuracy and good calibration overall. Models that incorporate additional genetic and nongenetic risk factors and estimate risk of tumor subtypes may further improve breast cancer risk prediction.
The objective of this study is to evaluate a natural language processing (NLP) algorithm that determines American College of Radiology Breast Imaging Reporting and Data System (BI-RADS) final assessment categories from radiology reports. This HIPAA-compliant study was granted institutional review board approval with waiver of informed consent. This cross-sectional study involved 1,165 breast imaging reports in the electronic medical record (EMR) from a tertiary care academic breast imaging center from 2009. Reports included screening mammography, diagnostic mammography, breast ultrasound, combined diagnostic mammography and breast ultrasound, and breast magnetic resonance imaging studies. Over 220 reports were included from each study type. The recall (sensitivity) and precision (positive predictive value) of a NLP algorithm to collect BI-RADS final assessment categories stated in the report final text was evaluated against a manual human review standard reference. For all breast imaging reports, the NLP algorithm demonstrated a recall of 100.0 % (95 % confidence interval (CI), 99.7, 100.0 %) and a precision of 96.6 % (95 % CI, 95.4, 97.5 %) for correct identification of BI-RADS final assessment categories. The NLP algorithm demonstrated high recall and precision for extraction of BI-RADS final assessment categories from the free text of breast imaging reports. NLP may provide an accurate, scalable data extraction mechanism from reports within EMRs to create databases to track breast imaging performance measures and facilitate optimal breast cancer population management strategies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.