Jan Arpe scite author profile

Theoretical Computer Science

2007

The combination of two major challenges in machine learning is investigated: dealing with large amounts of irrelevant information and learning from noisy data. It is shown that large classes of Boolean concepts that depend on a small number of variables-so-called juntas-can be learned efficiently from random examples corrupted by random attribute and classification noise. To accomplish this goal, a two-phase algorithm is presented that copes with several problems arising from the presence of noise: firstly, a suitable method for approximating Fourier coefficients in the presence of noise is applied to infer the relevant variables. Secondly, as one cannot simply read off a truth table from the examples as in the noise-free case, an alternative method to build a hypothesis is established and applied to the examples restricted to the relevant variables. In particular, for the class of monotone juntas depending on d out of n variables, the sample complexity is polynomial in log(n/δ), 2 d , γ −d a , and γ −1 b , where δ is the confidence parameter and γ a , γ b > 0 are noise parameters bounding the noise rates away from 1/2. The running time is bounded by the sample complexity times a polynomial in n. So far, all results hold for the case of uniformly distributed examples-the only case that (apart from side notes) has been studied in the literature yet. We show how to extend our methods to non-uniformly distributed examples and derive new results for monotone juntas. For the attribute noise, we have to assume that it is generated by a product distribution since otherwise fault-tolerant learning is in general impossible: we construct a noise distribution P and a concept class C such that it is impossible to learn C under P-noise.

show abstract

Robust Inference of Relevant Attributes

2003

Approximability of Minimum AND-Circuits

Manthey

2007

Algorithmica

Given a set of monomials, the Minimum AND-Circuit problem asks for a circuit that computes these monomials using AND-gates of fan-in two and being of minimum size. We prove that the problem is not polynomial time approximable within a factor of less than 1.0051 unless P = NP, even if the monomials are restricted to be of degree at most three. For the latter case, we devise several efficient approximation algorithms, yielding an approximation ratio of 1.278. For the general problem, we achieve an approximation ratio of d − 3/2, where d is the degree of the largest monomial. In addition, we prove that the problem is fixed parameter tractable with the number of monomials as parameter. Finally, we reveal connections between the Minimum ANDCircuit problem and several problems from different areas.

show abstract

On the Complexity of Optimal Grammar-Based Compression

Learning Juntas in the Presence of Noise

2006