IntroductionCausal feature selection entails identifying confounders that eliminate confounding bias when estimating effects from observational data. Traditionally, researchers employ expertise and literature review to identify confounders. Uncontrolled confounding from unidentified confounders threatens validity while conditioning on intermediate variables (mediators) weakens estimates, and conditioning on common effects (colliders) induces bias. Additionally, erroneously conditioning on variables playing multiple roles introduces bias. In a use case studying depression as a potential independent risk factor for Alzheimer’s disease (AD), we introduce a novel knowledge graph application enabling causal feature selection from computable literature-derived knowledge and biomedical ontologies to address these challenges.MethodsUsing the output from three machine reading systems, we harmonized the computable knowledge extracted from a scoped literature corpus. Next, we applied logical closure operations to infer missing knowledge and mapped the outputs to target terminologies. We then combined the outputs with ontology-grounded resources using a robust KG framework developed by computational biologists. Next, we translated epidemiological definitions of confounder, collider, and mediator into queries for searching the KG and summarized the roles played by the variables identified. Finally, we analyzed a selection of variables and reasoning paths in the search results.ResultsConfounder search yielded 128 confounders, including 58 phenotypes, 47 drugs, and 35 genes. Search also identified 23 collider and 16 mediator phenotypes. Only 31 of the 58 confounder phenotypes were found to behave exclusively as confounders. The remaining 27 phenotypes also play other roles, and 7 of the 21 confounders identified by both the KG and the literature were identified as being exclusively confounders. Stroke was an example of a variable playing all three roles.DiscussionOur findings suggest that our KG application could augment human expertise while confirming the complexity of selecting potential confounders for depression with AD. Imperfect concept mapping introduced errors, and the small literature corpus limited the scope of search results.ConclusionOur results suggest that our method may widely apply to causal feature selection. However, the search results need to be reviewed by human experts and tested empirically, and further work is required to optimize KG output for human consumption.Highlights•Knowledge of causal variables and their roles is essential for causal inference.•We show how to search a knowledge graph (KG) for causal variables and their roles.•The KG combines literature-derived knowledge with ontology-grounded knowledge.•We design queries to search the KG for confounder, collider, and mediator roles.•KG search reveals variables in these roles for depression and Alzheimer’s disease.Graphical abstract