BACKGROUND: With the emergence of web-based data collection methods, large digital health cohorts offer the opportunity to conduct behavioral and epidemiologic research at an unprecedented scale. The size and breadth of such data sets enable discovery of novel associations across the phenotypic spectrum.
METHODS: We deployed the digital symbol substitution test (DSST) online to consented 23andMe research participants 50-85 years of age. We tested cross-sectional associations between DSST performance and 824 phenotypes using linear regression models adjusted for age, sex, age*sex interaction, device, time of cohort entry, and ancestry, separately among discovery (n=144,786) and replication (n=93,428) samples; additional analyses further adjusted for education. We post-stratified association estimates on age, sex, and education to adjust for discrepancies across subsamples. Leveraging the rich genetic and phenotypic data available at 23andMe, we also estimated genetic and environmental correlations between DSST and its top correlates using linkage disequilibrium score regression.
RESULTS: 97 phenotypes were significantly (false discovery rate < 0.05) and strongly (standardized effect size > |0.5|) associated with DSST performance in the discovery phase. Of those, 60 (38 with additional adjustment for education) demonstrated both statistical significance and consistent direction of association in the replication sample. The significantly associated phenotypes largely clustered into the following categories: psychiatric traits (e.g. anxiety, β per 1 SD = -0.74, P-value=3.9x10-169), education (e.g. highest math class completed, β per 1 SD = 2.11, P-value <1 x 10-300), leisure activities (e.g. solitary activities like puzzles, β per 1 SD = 1.85, P-value <1.0x10-300), social determinants (e.g. household income, β per 1 SD = 1.20, P-value= 8.9x10-245, and lifestyle (e.g. years smoked, β per 1 SD = -0.98, P-value= 2.2x10-78). We identified several reproducible genetic correlations between DSST and its top associated exposures (e.g. 0.48 for leisure activities like puzzles, 0.28 for years of education, and -0.24 for anxiety; all P ≤ 7.9x10-26). For almost all exposures, genetic correlations with DSST were considerably stronger than environmental correlations.
CONCLUSIONS: We have conducted the largest study of cognitive performance to date, building evidence supporting its correlations with many social, lifestyle, and clinical exposures. We established that the observed associations are in part underpinned by shared genetic architecture. Our study illustrates the potential of large-scale digital cohorts to contribute to epidemiologic discovery.