Summary
More than 100 genetic etiologies have been identified in developmental and epileptic encephalopathies (DEEs), but correlating genetic findings with clinical features at scale has remained a hurdle because of a lack of frameworks for analyzing heterogenous clinical data. Here, we analyzed 31,742 Human Phenotype Ontology (HPO) terms in 846 individuals with existing whole-exome trio data and assessed associated clinical features and phenotypic relatedness by using HPO-based semantic similarity analysis for individuals with
de novo
variants in the same gene. Gene-specific phenotypic signatures included associations of
SCN1A
with “complex febrile seizures” (HP: 0011172; p = 2.1 × 10
−5
) and “focal clonic seizures” (HP: 0002266; p = 8.9 × 10
−6
),
STXBP1
with “absent speech” (HP: 0001344; p = 1.3 × 10
−11
), and
SLC6A1
with “EEG with generalized slow activity” (HP: 0010845; p = 0.018). Of 41 genes with
de novo
variants in two or more individuals, 11 genes showed significant phenotypic similarity, including
SCN1A
(n = 16, p < 0.0001),
STXBP1
(n = 14, p = 0.0021), and
KCNB1
(n = 6, p = 0.011). Including genetic and phenotypic data of control subjects increased phenotypic similarity for all genetic etiologies, whereas the probability of observing
de novo
variants decreased, emphasizing the conceptual differences between semantic similarity analysis and approaches based on the expected number of
de novo
events. We demonstrate that HPO-based phenotype analysis captures unique profiles for distinct genetic etiologies, reflecting the breadth of the phenotypic spectrum in genetic epilepsies. Semantic similarity can be used to generate statistical evidence for disease causation analogous to the traditional approach of primarily defining disease entities through similar clinical features.