There are many important high dimensional function classes that have fast agnostic learning algorithms when strong assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be sufficiently confident that the data indeed satisfies the distributional assumption, so that one can trust in the output quality of the agnostic learning algorithm? We propose a model by which to systematically study the design of tester-learner pairs (A, T ), such that if the distribution on examples in the data passes the tester T then one can safely trust the output of the agnostic learner A on the data.To demonstrate the power of the model, we apply it to the classical problem of agnostically learning halfspaces under the standard Gaussian distribution and present a tester-learner pair with a combined runtime of n Õ(1/ǫ 4 ) . This qualitatively matches that of the best known ordinary agnostic learning algorithms for this task. In contrast, finite sample Gaussian distribution testers do not exist for the L 1 and EMD distance measures. A key step in the analysis is a novel characterization of concentration and anticoncentration properties of a distribution whose low-degree moments approximately match those of a Gaussian. We also use tools from polynomial approximation theory.In contrast, we show strong lower bounds on the combined run-times of tester-learner pairs for the problems of agnostically learning convex sets under the Gaussian distribution and for monotone Boolean functions under the uniform distribution over {0, 1} n . Through these lower bounds we exhibit natural problems where there is a dramatic gap between standard agnostic learning run-time and the run-time of the best tester-learner pair.
The noise sensitivity of a Boolean function f : {0, 1} n → {0, 1} is one of its fundamental properties. A function of a positive noise parameter δ, it is denoted as N S δ [f ]. Here we study the algorithmic problem of approximating it for monotone f , such that N S δ [f ] ≥ 1/n C for constant C, and where δ satisfies 1/n ≤ δ ≤ 1/2. For such f and δ, we give a randomized algorithm performing O min (1,
We give the first agnostic, efficient, proper learning algorithm for monotone Boolean functions. Given 2 Õ( √ n/ε) uniformly random examples of an unknown function f : {±1} n → {±1}, our algorithm outputs a hypothesis g : {±1} n → {±1} that is monotone and (opt + ε)-close to f , where opt is the distance from f to the closest monotone function. The running time of the algorithm (and consequently the size and evaluation time of the hypothesis) is also 2 Õ( √ n/ε) , nearly matching the lower bound of [BCO + 15]. We also give an algorithm for estimating up to additive error ε the distance of an unknown function f to monotone using a run-time of 2 Õ( √ n/ε) . Previously, for both of these problems, sample-efficient algorithms were known, but these algorithms were not run-time efficient. Our work thus closes this gap in our knowledge between the run-time and sample complexity.This work builds upon the improper learning algorithm of [BT96] and the proper semiagnostic learning algorithm of [LRV22], which obtains a non-monotone Boolean-valued hypothesis, then "corrects" it to monotone using query-efficient local computation algorithms on graphs. This black-box correction approach can achieve no error better than 2opt + ε information-theoretically; we bypass this barrier by a) augmenting the improper learner with a convex optimization step, and b) learning and correcting a real-valued function before rounding its values to Boolean.Our real-valued correction algorithm solves the "poset sorting" problem of [LRV22] for functions over general posets with non-Boolean labels.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.