Background
While skin cancers are less prevalent in people with skin of color, they are more often diagnosed at later stages and have a poorer prognosis. The use of artificial intelligence (AI) models can potentially improve early detection of skin cancers, however the lack of skin color diversity in training datasets may only widen the pre-existing racial discrepancies in dermatology.
Objective
To systematically review the technique, quality, accuracy, and implications of studies using AI models trained or tested in populations with skin of color, for classification of pigmented skin lesions.
Methods
PubMed was used to identify any studies describing AI models for classification of pigmented skin lesions. Only studies that used training datasets with at least 10% of images from people with skin of color were eligible. Outcomes on study population, design of AI model, accuracy, and quality of the studies were reviewed.
Results
Twenty-two eligible articles were identified. Majority of studies were trained on datasets obtained from Chinese (7/22), Korean (5/22), and Japanese populations (3/22). Seven studies used diverse datasets containing Fitzpatrick skin type I-III in combination with at least 10% from Black American, Native American, Pacific Islander or Fitzpatrick IV-VI. AI models producing binary outcomes (e.g., benign vs malignant) reported an accuracy ranging from 70% to 99.7%. Accuracy of AI models reporting multiclass outcomes (e.g., specific lesion diagnosis) was lower, ranging from 43% to 93%. Reader studies, where dermatologists’ classification is compared with AI model outcomes, reported similar accuracy in one study, higher AI accuracy in three studies, and higher clinician accuracy in two studies. A quality review revealed that dataset description and variety, benchmarking, public evaluation, and healthcare application were frequently not addressed.
Conclusions
While this review provides promising evidence of accurate AI models in skin of color populations, there are still large discrepancies remain in the number of AI models developed in populations with skin of color (particularly Fitzpatrick type IV-VI) and those with largely European ancestry. A lack of publicly available datasets from diverse populations is likely a contributing factor, as is the inadequate reporting of patient-level metadata relating to skin color in training datasets.