Given a positive function g from [0, 1] to the reals, the function's missing mass in a sequence of iid samples, defined as the sum of g(Pr(x)) over the missing letters x, is introduced and studied.The missing mass of a function generalizes the classical missing mass, and has several interesting connections to other related estimation problems. Minimax estimation is studied for order-α missing mass (g(p) = p α ) for both integer and non-integer values of α. Exact minimax convergence rates are obtained for the integer case. Concentration is studied for a class of functions and specific results are derived for order-α missing mass and missing Shannon entropy (g(p) = −p log p). Sub-Gaussian tail bounds with near-optimal worst-case variance factors are derived. Two new notions of concentration, named strongly sub-Gamma and filtered sub-Gaussian concentration, are introduced and shown to result in right tail bounds that are better than those obtained from sub-Gaussian concentration.Index terms-Missing mass, Good-Turing estimator, missing mass of a function, entropy, Mean squared error, minimax optimality, concentration, tail bounds, sub-Gaussian and sub-Gamma tails.
I. INTRODUCTIONLet P be an arbitrary discrete distribution on an alphabet X . For x ∈ X , let p x P (x). Given a positive function g : [0, 1] → [0, ∞), we define a so-called additive function G(P ) over the distribution P as follows: