Background Delays in the diagnosis of genetic syndromes are common, particularly in low and middle-income countries with limited access to genetic screening services. We, therefore, aimed to develop and evaluate a machine learning-based screening technology using facial photographs to evaluate a child's risk of presenting with a genetic syndrome for use at the point of care.
MethodsIn this retrospective study, we developed a facial deep phenotyping technology based on deep neural networks and facial statistical shape models to screen children for genetic syndromes. We trained the machine learning models on facial photographs from children (aged <21 years) with a clinical or molecular diagnosis of a genetic syndrome and controls without a genetic syndrome matched for age, sex, and race or ethnicity. Images were obtained from three publicly available databases (the Atlas of Human Malformations in Diverse Populations of the National Human Genome Research Institute, Face2Gene, and the dataset available from Ferry and colleagues) and the archives of the Children's National Hospital (Washington, DC, USA), in addition to photographs taken on a standard smartphone at the Children's National Hospital. We designed a deep learning architecture structured into three neural networks, which performed image standardisation (Network A), facial morphology detection (Network B), and genetic syndrome risk estimation, accounting for phenotypic variations due to age, sex, and race or ethnicity (Network C). Data were divided randomly into 40 groups for cross validation, and the performance of the model was evaluated in terms of accuracy, sensitivity, and specificity in both the total population and stratified by race or ethnicity, age, and sex. Findings Our dataset included 2800 facial photographs of children (1318 [47%] female and 1482 [53%] male; 1576 [56%] White, 432 [15%] African, 430 [15%] Hispanic, and 362 [13%] Asian). 1400 children with 128 genetic conditions were included (the most prevalent being Williams-Beuren syndrome [19%], Cornelia de Lange syndrome [17%], Down syndrome [16%], 22q11.2 deletion [13%], and Noonan syndrome [12%] syndrome) in addition to 1400 photographs of matched controls. In the total population, our deep learning-based model had an accuracy of 88% (95% CI 87-89) for the detection of a genetic syndrome, with 90% sensitivity (95% CI 88-92) and 86% specificity (95% CI 84-88). Accuracy was greater in White (90%, 89-91) and Hispanic populations (91%, 88-94) than in African (84%, 81-87) and Asian populations (82%, 78-86). Accuracy was also similar in male (89%, 87-91) and female children (87%, 85-89), and similar in children younger than 2 years (86%, 84-88) and children aged 2 years or older (eg, 89% [87-91] for those aged 2 years to <5 years). Interpretation This genetic screening technology could support early risk stratification at the point of care in global populations, which has the potential accelerate diagnosis and reduce mortality and morbidity through preventive care.