Given two random variables X and Y , an operational approach is undertaken to quantify the "leakage" of information from X to Y . The resulting measure L (X→Y ) is called maximal leakage, and is defined as the multiplicative increase, upon observing Y , of the probability of correctly guessing a randomized function of X, maximized over all such randomized functions. A closed-form expression for L (X→Y ) is given for discrete X and Y , and it is subsequently generalized to handle a large class of random variables. The resulting properties are shown to be consistent with an axiomatic view of a leakage measure, and the definition is shown to be robust to variations in the setup. Moreover, a variant of the Shannon cipher system is studied, in which performance of an encryption scheme is measured using maximal leakage. A single-letter characterization of the optimal limit of (normalized) maximal leakage is derived and asymptotically-optimal encryption schemes are demonstrated. Furthermore, the sample complexity of estimating maximal leakage from data is characterized up to subpolynomial factors. Finally, the guessing framework used to define maximal leakage is used to give operational interpretations of commonly used leakage measures, such as Shannon capacity, maximal correlation, and local differential privacy. arXiv:1807.07878v1 [cs.IT] 20 Jul 2018 3 (R4) It should accord with intuition. That is, it should not mis-characterize the (severity of) information leakage in systems that we understand well.
A. Common Information-Theoretic ApproachesNotably, many commonly-used information leakage metrics do not satisfy the above requirements. For example, mutual information, which has been frequently used as a leakage measure [3]-[5][18]- [21], arguably fails to satisfy both (R1) and (R4). Regarding the latter, consider the following example proposed by Smith [22].Example 1: Given n ∈ N, let X = {0, 1} 8n and X ∼ Unif(X ). Now consider the following two conditional distributions:1, otherwise. and P Z|X = (X 1 , X 2 , . . . , X n+1 ).Then the probability of guessing X correctly from Y is at least 1/8, whereas the probability of guessing X correctly from Z is only 2 −7n+1 for Z. However, one can readily verify that I(X; Y ) ≈ (n + 0.169) log 2 ≤ I(X; Z) = (n + 1) log 2 [22].Regarding the former, note that operational interpretations of mutual information arise in transmission and compression settings, which are different from the security setting at hand. Moreover, in those settings, mutual information arises as part of a computable characterization of the solution, rather than as part of the formulation itself, i.e., the transmission and compression problems are not defined in terms of mutual information.Mutual information could potentially be justified by appealing to rate-distortion theory [23, Section V]. In fact, a number of leakage measures in the literature are based on rate-distortion theory. For instance, Yamomoto [24] introduces a distortion function d and measures the privacy of P Y |X using infx (·) E[d(X,x(Y ))]. ...