2021
DOI: 10.48550/arxiv.2106.11234
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A causal view on compositional data

Abstract: Many scientific datasets are compositional in nature. Important examples include species abundances in ecology, rock compositions in geology, topic compositions in large-scale text corpora, and sequencing count data in molecular biology. Here, we provide a causal view on compositional data in an instrumental variable setting where the composition acts as the cause. Throughout, we pay particular attention to the interpretation of compositional causes from the viewpoint of interventions and crisply articulate po… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 59 publications
0
4
0
Order By: Relevance
“…In a diet intervention study, for instance, it is not unlikely that the intended direct effect on host health is mediated or confounded by the presence of certain microbes in the microbiome. While the instrumental variable (IV) approach 67 provides a powerful framework to uncover causal effects (see also Ailer et al 68 in the context of microbiome data), it requires that the instruments are strong and not confounded. A standard IV approach for continuous data estimates the parameters using two-stage least square.…”
Section: Discussionmentioning
confidence: 99%
“…In a diet intervention study, for instance, it is not unlikely that the intended direct effect on host health is mediated or confounded by the presence of certain microbes in the microbiome. While the instrumental variable (IV) approach 67 provides a powerful framework to uncover causal effects (see also Ailer et al 68 in the context of microbiome data), it requires that the instruments are strong and not confounded. A standard IV approach for continuous data estimates the parameters using two-stage least square.…”
Section: Discussionmentioning
confidence: 99%
“…1). To enhance the interpretability of these log-contrasts and address potential issues regarding causality (Ailer et al, 2021), we incorporated the count of all topological roles into the equation. This ensured that an increase in the Connector balance was necessarily driven by an increase in the absolute count of connector nodes.…”
Section: Compositional Analysis Of the Language Connectomementioning
confidence: 99%
“…An attractive property of the log-contrast model is that its coefficients quantify the effect of a multiplicative perturbation (i.e., fractionally increasing one component while adjusting the others) on the response. While several extensions of the log-contrast model exist [e.g., [13][14][15][16][17], its parametric approach to supervised learning has two major shortcomings that become particularly severe when applied to high-dimensional and zero-inflated high-throughput sequencing data [18,19]. Firstly, since the logarithm is not defined at zero, the log-contrast model cannot be directly applied.…”
Section: Introductionmentioning
confidence: 99%