technique on a large-scale clinical study (UK Biobank), which includes imaging, cognitive,
47and clinical assessment data. The UK Biobank archive presents several difficult challenges 48 related to the aggregation, harmonization, modeling, and interrogation of the information.
49These problems are related to the complex longitudinal structure, feature heterogeneity, 50 multicollinearity, incongruency, and missingness, as well as violations of classical parametric 51 assumptions that require novel health analytical approaches.
53Our results showcase the scalability, efficiency and potential of CBDA to compress complex 54 data into structural information leading to derived knowledge and translational action. The 55 results of the real case-study suggest new and exciting avenues of research in the context of 56 identifying, tracking, and treating mental health and aging-related disorders. Following open-57 science principles, we share the entire end-to-end protocol, source-code, and results. This 58 facilitates independent validation, result reproducibility, and team-based collaborative 59 discovery. 60 61 Figure 5: Dissimilarity and variance analysis of the coefficients/weights distributions of the ensemble predictor. Binomial (Panels A-C) and Null (Panels B-D)datasets analysis (each with 10,000 cases and 1,000 features). The x axis displays the top-ranked models (from 50 to 5,000). The y axis shows the mean value of the * Bray-Curtis dissimilarity distance within the SuperLearner coefficients (Panels A-B) and the variance of the SuperLearner coefficients (Panels C and D). *