Life course epidemiology relies on specifying complex (causal) models that describe how variables interplay over time. Traditionally, such models have been constructed by perusing existing theory and previous studies. By comparing data-driven and theory-driven models, we investigate whether data-driven causal discovery algorithms can help this process. We focus on a longitudinal dataset following a cohort of Danish men. The theory-driven models are constructed by two subject-field experts. The data-driven models are constructed by use of temporal Peter-Clark (TPC) algorithm. TPC utilizes the temporal information embedded in life course data.
We find that the data-driven models recover some, but not all, causal relationships included in the theory-driven expert models. The data-driven method is especially good at identifying direct causal relationships that the experts have high confidence in. Moreover, in a post-hoc assessment we found that most of the direct causal relationships proposed by the data-driven model, but not included in the theory-driven model, were plausible. Thus, the data-driven model may propose additional meaningful causal hypothesis that are new or have been overlooked by the experts. In conclusion, data-driven methods can aid causal model construction in life course epidemiology, and combining both data-driven and theory-driven methods can lead to even stronger models.