In this paper, we study two problems: (1) estimation of a d-dimensional log-concave distribution and (2) bounded multivariate convex regression with random design with an underlying log-concave density or a compactly supported distribution with a continuous density. First, we show that for all d ≥ 4 the maximum likelihood estimators of both problems achieve an optimal risk of Θ d (n −2/(d+1) ) * (up to a logarithmic factor) in terms of squared Hellinger distance and L 2 squared distance, respectively. Previously, the optimality of both these estimators was known only for d ≤ 3. We also prove that the ǫ-entropy numbers of the two aforementioned families are equal up to logarithmic factors. We complement these results by proving a sharp bound Θ d (n −2/(d+4) ) on the minimax rate (up to logarithmic factors) with respect to the total variation distance. Finally, we prove that estimating a log-concave density-even a uniform distribution on a convex set-up to a fixed accuracy requires the number of samples at least exponential in the dimension. We do that by improving the dimensional constant in the best known lower bound for the minimax rate from 2provides, under appropriate conditions, the global minimax rates for estimation with respect to squared L 2 (P) (for regression) and squared Hellinger (for density estimation) measures of closeness [Yang and Barron, 1999].Here N 2 (F , ǫ, P) is a covering number of F with respect to L 2 (P) at scale ǫ, defined as the smallest number of functions f 1 , . . . , f N ∈ F such that ∀f ∈ F , ∃j s.t. f − f j L2(P) ≤ ǫ . * In the regression setting, this bound is tight for certain measures, e.g. when the underlying distribution is uniform on a ball. However, for some log-concave measures, the minimax is of order Θ d (n1 1 We acknowledge the work by Han [2019] which appeared few months after our initial manuscript became available on arXiv. From a recent personal communication with the author, some of his results were achieved in his PhD thesis that was available online before our initial manuscript. The author used techniques that are very similar to our approach.2 See Remarks 1,2 for more details.
Much of modern learning theory has been split between two regimes: the classical offline setting, where data arrive independently, and the online setting, where data arrive adversarially. While the former model is often both computationally and statistically tractable, the latter requires no distributional assumptions. In an attempt to achieve the best of both worlds, previous work proposed the smooth online setting where each sample is drawn from an adversarially chosen distribution, which is smooth, i.e., it has a bounded density with respect to a fixed dominating measure. Existing results for the smooth setting were known only for binary-valued function classes and were computationally expensive in general; in this paper, we fill these lacunae. In particular, we provide tight bounds on the minimax regret of learning a nonparametric function class, with nearly optimal dependence on both the horizon and smoothness parameters. Furthermore, we provide the first oracle-efficient, no-regret algorithms in this setting. In particular, we propose an oracle-efficient improper algorithm whose regret achieves optimal dependence on the horizon and a proper algorithm requiring only a single oracle call per round whose regret has the optimal horizon dependence in the classification setting and is sublinear in general. Both algorithms have exponentially worse dependence on the smoothness parameter of the adversary than the minimax rate. We then prove a lower bound on the oracle complexity of any proper learning algorithm, which matches the oracle-efficient upper bounds up to a polynomial factor, thus demonstrating the existence of a statistical-computational gap in smooth online learning. Finally, we apply our results to the contextual bandit setting to show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits in the case that contexts arrive in a smooth manner.
Local differential privacy (LDP) is a model where users send privatized data to an untrusted central server whose goal it to solve some data analysis task. In the non-interactive version of this model the protocol consists of a single round in which a server sends requests to all users then receives their responses. This version is deployed in industry due to its practical advantages and has attracted significant research interest. Our main result is an exponential lower bound on the number of samples necessary to solve the standard task of learning a large-margin linear separator in the non-interactive LDP model. Via a standard reduction this lower bound implies an exponential lower bound for stochastic convex optimization and specifically, for learning linear models with a convex, Lipschitz and smooth loss. These results answer the questions posed by Smith, Thakurta, and Upadhyay (IEEE Symposium on Security and Privacy 2017) and Daniely and Feldman (NeurIPS 2019). Our lower bound relies on a new technique for constructing pairs of distributions with nearly matching moments but whose supports can be nearly separated by a large margin hyperplane. These lower bounds also hold in the model where communication from each user is limited and follow from a lower bound on learning using non-adaptive statistical queries.
A basic combinatorial interpretation of Shannon's entropy function is via the "20 questions" game. This cooperative game is played by two players, Alice and Bob: Alice picks a distribution π over the numbers {1, . . . , n}, and announces it to Bob. She then chooses a number x according to π, and Bob attempts to identify x using as few Yes/No queries as possible, on average. An optimal strategy for the "20 questions" game is given by a Huffman code for π: Bob's questions reveal the codeword for x bit by bit. This strategy finds x using fewer than H(π) + 1 questions on average. However, the questions asked by Bob could be arbitrary. In this paper, we investigate the following question: Are there restricted sets of questions that match the performance of Huffman codes, either exactly or approximately?Our first main result shows that for every distribution π, Bob has a strategy that uses only questions of the form "x < c?" and "x = c?", and uncovers x using at most H(π) + 1 questions on average, matching the performance of Huffman codes in this sense. We also give a natural set of O(rn 1/r ) questions that achieve a performance of at most H(π) + r, and show that Ω(rn 1/r ) questions are required to achieve such a guarantee.Our second main result gives a set Q of 1.25 n+o(n) questions such that for every distribution π, Bob can implement an optimal strategy for π using only questions from Q. We also show that 1.25 n−o(n) questions are needed, for infinitely many n. If we allow a small slack of r over the optimal strategy, then roughly (rn) Θ(1/r) questions are necessary and sufficient.We summarize this with the following meta-question, which guides this work: Are there "nice" sets of queries Q such that for any distribution, there is a "high quality" strategy that uses only queries from Q?Formalizing this question depends on how "nice" and "high quality" are quantified. We consider two different benchmarks for sets of queries:1. An information-theoretical benchmark: A set of queries Q has redundancy r if for every distribution π there is a strategy using only queries from Q that finds x with at most H(π)+r queries on average when x is drawn according to π. A combinatorial benchmark:A set of queries Q is r-optimal (or has prolixity r) if for every distribution π there is a strategy using queries from Q that finds x with at most Opt(π) + r queries on average when x is drawn according to π, where Opt(π) is the expected number of queries asked by an optimal strategy for π (e.g. a Huffman tree).Given a certain redundancy or prolixity, we will be interested in sets of questions achieving that performance that (i) are as small as possible, and (ii) allow efficient construction of high quality strategies which achieve the target performance. In some cases we will settle for only one of these properties, and leave the other as an open question.Information-theoretical benchmark. Let π be a distribution over X. A basic result in information theory is that every algorithm that reveals an unknown element x drawn according to π (in...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.