Purpose
To characterize the demographic distribution of The Cancer Imaging
Archive (TCIA) studies and compare them with those of the U.S. cancer
population.
Materials and Methods
In this retrospective study, data from TCIA studies were examined for the
inclusion of demographic information. Of 189 studies in TCIA up until
April 2023, a total of 83 human cancer studies were found to contain
supporting demographic data. The median patient age and the sex, race,
and ethnicity proportions of each study were calculated and compared
with those of the U.S. cancer population, provided by the Surveillance,
Epidemiology, and End Results Program and the Centers for Disease
Control and Prevention U.S. Cancer Statistics Data Visualizations
Tool.
Results
The median age of TCIA patients was found to be 6.84 years lower than
that of the U.S. cancer population (
P
= .047) and
contained more female than male patients (53% vs 47%). American Indian
and Alaska Native, Black or African American, and Hispanic patients were
underrepresented in TCIA studies by 47.7%, 35.8%, and 14.7%,
respectively, compared with the U.S. cancer population.
Conclusion
The results demonstrate that the patient demographics of TCIA data sets
do not reflect those of the U.S. cancer population, which may decrease
the generalizability of artificial intelligence radiology tools
developed using these imaging data sets.
Keywords:
Ethics, Meta-Analysis, Health Disparities, Cancer
Health Disparities, Machine Learning, Artificial Intelligence, Race,
Ethnicity, Sex, Age, Bias
Published under a CC BY 4.0 license.
See also the commentary by
Miles and Porras
in this issue