Highlights
17• Multimodal diseases are those in which affected individuals can be divided into subtypes (or 'data 18 modes'); for instance, 'mild' vs. 'severe', based on (unknown) modifiers of disease severity.
20• The role of the microbiome in multimodal diseases has been studied in animals; however, findings 21 are often deemed irreproducible, or unreasonably biased, with pathogenic roles in 95% of reports.
23• As a solution to repeatably, investigators have been told to seek funds to increase the number of 24 human-microbiome donors (N) to increase the reproducibility of animal studies.
26• Herein, we illustrate that although increasing N could help identify statistical effects (patterns of 27 analytical irreproducibility), clinically-relevant information will not always be identified.
29• Depending on which diseases need to be compared, 'random sampling' alone leads to reproducible 30 'patterns of analytical irreproducibility' in multimodal disease simulations.
32• Instead of solely increasing N, we illustrate how disease multimodality could be understood, 33 visualized and used to guide the study of diseases by selecting and focusing on 'disease modes'.
35 36Abstract 37 Multimodal diseases are those in which affected individuals can be divided into subtypes (or 'data 38 modes'); for instance, 'mild' vs. 'severe', based on (unknown) modifiers of disease severity. Studies have 39 shown that despite the inclusion of a large number of subjects, the causal role of the microbiome in 40 human diseases remains uncertain. The role of the microbiome in multimodal diseases has been studied 41 in animals; however, findings are often deemed irreproducible, or unreasonably biased, with pathogenic 42 roles in 95% of reports. As a solution to repeatability, investigators have been told to seek funds to 43 increase the number of human-microbiome donors (N) to increase the reproducibility of animal studies 44 (doi:10.1016/j.cell.2019.12.025). Herein, through simulations, we illustrate that increasing N will not 45 uniformly/universally enable the identification of consistent statistical differences (patterns of analytical 46 irreproducibility), due to random sampling from a population with ample variability in disease and the 47 presence of 'disease data subtypes' (or modes). We also found that studies do not use cluster statistics 48 when needed (97.4%, 37/38, 95%CI=86.5,99.5), and that scientists who increased N, concurrently 49 reduced the number of mice/donor (y=-0.21x, R 2 =0.24; and vice versa), indicating that statistically, 50 scientists replace the disease variance in mice by the variance of human disease. Instead of assuming 51 that increasing N will solve reproducibility and identify clinically-predictive findings on causality, we 52 propose the visualization of data distribution using kernel-density-violin plots (rarely used in rodent 53 studies; 0%, 0/38, 95%CI=6.9e-18,9.1) to identify 'disease data subtypes' to self-correct, guide and 54 promote the personalized investigation of disease subtype mechanisms.
56Key...