An infinite urn scheme is defined by a probability mass function (pj) j≥1 over positive integers. A random allocation consists of a sample of N independent drawings according to this probability distribution where N may be deterministic or Poisson-distributed. This paper is concerned with occupancy counts, that is with the number of symbols with r or at least r occurrences in the sample, and with the missing mass that is the total probability of all symbols that do not occur in the sample. Without any further assumption on the sampling distribution, these random quantities are shown to satisfy Bernstein-type concentration inequalities. The variance factors in these concentration inequalities are shown to be tight if the sampling distribution satisfies a regular variation property. This regular variation property reads as follows. Let the number of symbols with probability larger than x be ν(x) = |{j: pj ≥ x}|. In a regularly varying urn scheme, ν satisfies limτ→0 ν(τ x)/ ν(τ ) = x −α for α ∈ [0, 1] and the variance of the number of distinct symbols in a sample tends to infinity as the sample size tends to infinity. Among other applications, these concentration inequalities allow us to derive tight confidence intervals for the Good-Turing estimator of the missing mass.
The study of fairness in intelligent decision systems has mostly ignored long-term influence on the underlying population. Yet fairness considerations (e.g. affirmative action) have often the implicit goal of achieving balance among groups within the population. The most basic notion of balance is eventual equality between the qualifications of the groups. How can we incorporate influence dynamics in decision making? How well do dynamics-oblivious fairness policies fare in terms of reaching equality? In this paper, we propose a simple yet revealing model that encompasses (1) a selection process where an institution chooses from multiple groups according to their qualifications so as to maximize an institutional utility and (2) dynamics that govern the evolution of the groups' qualifications according to the imposed policies. We focus on demographic parity as the formalism of affirmative action.We then give conditions under which an unconstrained policy reaches equality on its own. In this case, surprisingly, imposing demographic parity may break equality. When it doesn't, one would expect the additional constraint to reduce utility, however, we show that utility may in fact increase. In more realistic scenarios, unconstrained policies do not lead to equality. In such cases, we show that although imposing demographic parity may remedy it, there is a danger that groups settle at a worse set of qualifications. As a silver lining, we also identify when the constraint not only leads to equality, but also improves all groups. This gives quantifiable insight into both sides of the mismatch hypothesis. These cases and trade-offs are instrumental in determining when and how imposing demographic parity can be beneficial in selection processes, both for the institution and for society on the long run. *
In this paper, we study the problem of lossless universal source coding for stationary memoryless sources on countably infinite alphabets. This task is generally not achievable without restricting the class of sources over which universality is desired. Building on our prior work, we propose natural families of sources characterized by a common dominating envelope. We particularly emphasize the notion of adaptivity, which is the ability to perform as well as an oracle knowing the envelope, without actually knowing it. This is closely related to the notion of hierarchical universal source coding, but with the important difference that families of envelope classes are not discretely indexed and not necessarily nested.Our contribution is to extend the classes of envelopes over which adaptive universal source coding is possible, namely by including max-stable (heavy-tailed) envelopes which are excellent models in many applications, such as natural language modeling. We derive a minimax lower bound on the redundancy of any code on such envelope classes, including an oracle that knows the envelope. We then propose a constructive code that does not use knowledge of the envelope. The code is computationally efficient and is structured to use an Expanding Threshold for Auto-Censoring, and we therefore dub it the ETAC-code. We prove that the ETAC-code achieves the lower bound on the minimax redundancy within a factor logarithmic in the sequence length, and can be therefore qualified as a near-adaptive code over families of heavy-tailed envelopes. For finite and lighttailed envelopes the penalty is even less, and the same code follows closely previous results that explicitly made the lighttailed assumption. Our technical results are founded on methods from regular variation theory and concentration of measure.
This paper shows that one cannot learn the probability of rare events without imposing further structural assumptions. The event of interest is that of obtaining an outcome outside the coverage of an i.i.d. sample from a discrete distribution. The probability of this event is referred to as the "missing mass". The impossibility result can then be stated as: the missing mass is not distribution-free PAC-learnable in relative error. The proof is semi-constructive and relies on a coupling argument using a dithered geometric distribution. This result formalizes the folklore that in order to predict rare events, one necessarily needs distributions with "heavy tails".
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.