Background
Understanding of cancer outcomes is limited by data fragmentation. We analyzed the information yielded by integrating breast cancer data from three sources: electronic medical records (EMRs) of two healthcare systems and the state registry.
Methods
We extracted diagnostic test and treatment data from EMRs of all breast cancer patients treated from 2000â2010 in two independent California institutions: a community-based practice (Palo Alto Medical Foundation) and an academic medical center (Stanford University). We incorporated records from the population-based California Cancer Registry (CCR), and then linked EMR-CCR datasets of Community and University patients.
Results
We initially identified 8210 University patients and 5770 Community patients; linked datasets revealed a 16% patient overlap, yielding 12,109 unique patients. The proportion of all Community patients, but not University patients, treated at both institutions increased with worsening cancer prognostic factors. Before linking datasets, Community patients appeared to receive less intervention than University patients (mastectomy: 37.6% versus 43.2%; chemotherapy: 35% versus 41.7%; magnetic resonance imaging (MRI): 10% versus 29.3%; genetic testing: 2.5% versus 9.2%). Linked Community and University datasets revealed that patients treated at both institutions received substantially more intervention (mastectomy: 55.8%; chemotherapy: 47.2%; MRI: 38.9%; genetic testing: 10.9%; p<0.001 for each three-way institutional comparison).
Conclusion
Data linkage identified 16% of patients who were treated in two healthcare systems and who, despite comparable prognostic factors, received far more intensive treatment than others. By integrating complementary data from EMRs and population-based registries, we obtained a more comprehensive understanding of breast cancer care and factors that drive treatment utilization.