Abstract. Let A be an isotropic, sub-gaussian m × n matrix. We prove that the process Zx := Ax 2 − √ m x 2 has sub-gaussian increments, that is, Zx − Zy ψ 2 ≤ C x − y 2 for any x, y ∈ R n . Using this, we show that for any bounded set T ⊆ R n , the deviation of Ax 2 around its mean is uniformly bounded by the Gaussian complexity of T . We also prove a local version of this theorem, which allows for unbounded sets. These theorems have various applications, some of which are reviewed in this paper. In particular, we give a new result regarding model selection in the constrained linear model.
We introduce a novel technique for distribution learning based on a notion of
sample compression
. Any class of distributions that allows such a compression scheme can be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of
products
and
mixtures
of those distributions.
As an application of this technique, we prove that ˜Θ(
kd
2
/ε
2
) samples are necessary and sufficient for learning a mixture of
k
Gaussians in R
d
, up to error ε in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that Õ(
kd
/ε
2
) samples suffice, matching a known lower bound. Moreover, these results hold in an agnostic learning (or robust estimation) setting, in which the target distribution is only approximately a mixture of Gaussians. Our main upper bound is proven by showing that the class of Gaussians in R
d
admits a small compression scheme.
MapReduce has become the de facto standard model for designing distributed algorithms to process big data on a cluster. There has been considerable research on designing efficient MapReduce algorithms for clustering, graph optimization, and submodular optimization problems. We develop new techniques for designing greedy and local ratio algorithms in this setting. Our randomized local ratio technique gives 2-approximations for weighted vertex cover and weighted matching, and an f -approximation for weighted set cover, all in a constant number of MapReduce rounds. Our randomized greedy technique gives algorithms for maximal independent set, maximal clique, and a (1+ε) ln ∆-approximation for weighted set cover. We also give greedy algorithms for vertex colouring with (1 + o(1))∆ colours and edge colouring with (1 + o(1))∆ colours.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.