The upper tail problem in the Erdős-Rényi random graph G ∼ Gn,p asks to estimate the probability that the number of copies of a graph H in G exceeds its expectation by a factor 1 + δ. Chatterjee and Dembo showed that in the sparse regime of p → 0 as n → ∞ with p ≥ n −α for an explicit α = αH > 0, this problem reduces to a natural variational problem on weighted graphs, which was thereafter asymptotically solved by two of the authors in the case where H is a clique.Here we extend the latter work to any fixed graph H and determine a function cH (δ) such that, for p as above and any fixed δ > 0, the upper tail probability is exp[−(cH (δ)+o(1))n 2 p ∆ log(1/p)], where ∆ is the maximum degree of H. As it turns out, the leading order constant in the large deviation rate function, cH (δ), is governed by the independence polynomial of H, defined as PH (x) = iH (k)x k where iH (k) is the number of independent sets of size k in H. For instance, if H is a regular graph on m vertices, then cH (δ) is the minimum between 1 2 δ 2/m and the unique positive solution of PH (x) = 1 + δ.
Summary
To identify the estimand in missing data problems and observational studies, it is common to base the statistical estimation on the ‘missingness at random’ and ‘no unmeasured confounder’ assumptions. However, these assumptions are unverifiable by using empirical data and pose serious threats to the validity of the qualitative conclusions of statistical inference. A sensitivity analysis asks how the conclusions may change if the unverifiable assumptions are violated to a certain degree. We consider a marginal sensitivity model which is a natural extension of Rosenbaum's sensitivity model that is widely used for matched observational studies. We aim to construct confidence intervals based on inverse probability weighting estimators, such that asymptotically the intervals have at least nominal coverage of the estimand whenever the data‐generating distribution is in the collection of marginal sensitivity models. We use a percentile bootstrap and a generalized minimax–maximin inequality to transform this intractable problem into a linear fractional programming problem, which can be solved very efficiently. We illustrate our method by using a real data set to estimate the causal effect of fish consumption on blood mercury level.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.