The primary problem in property testing is to decide whether a given function satisfies a certain property, or is far from any function satisfying it. This crucially requires a notion of distance between functions. The most prevalent notion is the Hamming distance over the uniform distribution on the domain. This restriction to uniformity is more a matter of convenience than of necessity, and it is important to investigate distances induced by more general distributions. In this paper, we make significant strides in this direction. We give simple and optimal testers for bounded derivative properties over arbitrary product distributions. Bounded derivative properties include fundamental properties such as monotonicity and Lipschitz continuity. Our results subsume almost all known results (upper and lower bounds) on monotonicity and Lipschitz testing.We prove an intimate connection between bounded derivative property testing and binary search trees (BSTs). We exhibit a tester whose query complexity is the sum of expected depths of optimal BSTs for each marginal. Furthermore, we show this sum-of-depths is also a lower bound. A fundamental technical contribution of this work is an optimal dimension reduction theorem for all bounded derivative properties, which relates the distance of a function from the property to the distance of restrictions of the function to random lines. Such a theorem has been elusive even for monotonicity for the past 15 years, and our theorem is an exponential improvement to the previous best known result.This means the rth-partial derivative of f is bounded by quantities that only depend on the rth coordinate. Note that this dependence is completely arbitrary, and different dimensions can have completely different bounds. This forms a rich class of properties which includes monotonicity and c-Lipschitz continuity. To get monotonicity, simply set l r (y) = 0 and u r (y) = ∞ for all r. To get c-Lipschitz continuity, set l r (y) = −c and u r (y) = +c for all r. The class also includes the property demanding monotonicity for some (fixed) coordinates and the c-Lipschitz continuity for others; and the non-uniform Lipschitz property that demands different Lipschitz constants for different coordinates.Definition 1.2. Fix a bounding family B and product distribution D = r≤d D r . Define dist D (f, g) = Pr x∼D [f (x) = g(x)]. A property tester for P(B) with respect to D takes as input proximity parameter ε > 0 and has query access to function f . If f ∈ P(B), the tester accepts with probability > 2/3. If dist D (f, P(B)) > ε, the tester rejects with probability > 2/3.
Property testers form an important class of sublinear algorithms. In the standard property testing model, an algorithm accesses the input function f : D → R via an oracle. With very few exceptions, all property testers studied in this model rely on the oracle to provide function values at all queried domain points. However, in many realistic situations, the oracle may be unable to reveal the function values at some domain points due to privacy concerns, or when some of the values get erased by mistake or by an adversary. The testers do not learn anything useful about the property by querying those erased points. Moreover, the knowledge of a tester may enable an adversary to erase some of the values so as to increase the query complexity of the tester arbitrarily or, in some cases, make the tester entirely useless.In this work, we initiate a study of property testers that are resilient to the presence of adversarially erased function values. An α-erasure-resilient ε-tester is given parameters α, ε ∈ (0, 1), along with oracle access to a function f such that at most an α fraction of function values have been erased. The tester does not know whether a value is erased until it queries the corresponding domain point. The tester has to accept with high probability if there is a way to assign values to the erased points such that the resulting function satisfies the desired property P. It has to reject with high probability if, for every assignment of values to the erased points, the resulting function has to be changed in at least an ε-fraction of the non-erased domain points to satisfy P.We design erasure-resilient property testers for a large class of properties. For some properties, it is possible to obtain erasure-resilient testers by simply using standard testers as a black box. However, there are more challenging properties for which all known testers rely on querying a specific point. If this point is erased, all these testers break. We give efficient erasure-resilient testers for several important classes of such properties of functions including monotonicity, the Lipschitz property, and convexity. Finally, we show a separation between the standard and erasure-resilient testing. Specifically, we describe a property that can be ε-tested with O(1/ε) queries in the standard model, whereas testing it in the erasure-resilient model requires number of queries polynomial in the input size. 1 Sublinear-time algorithms with various distributional assumptions on the positions of the input the algorithms access have been investigated, for example, in [GGR98, BBBY12, GR16]. There is also a line of work, initiated by [BFR + 13], that studies sublinear algorithms that access distributions, as opposed to fixed datasets. In this work, we focus on fixed datasets.Relationships to other models We explore the relationship of erasure-resilient testing with other testing models in Section 7. We provide (in Section 7.1) a separation between our erasure-resilient model and the standard model. Specifically, we prove the existence of a ...
Suffix tree is one of the most important data structures in string algorithms and biological sequence analysis. Unfortunately, when it comes to implementing those algorithms and applying them to real genomic sequences, often the main memory size becomes the bottleneck. This is easily explained by the fact that while a DNA sequence of length n from alphabet Σ = { A , C , G , T } can be stored in n log |Σ| = 2 n bits, its suffix tree occupies O ( n log n ) bits. In practice, the size difference easily reaches factor 50. We report on an implementation of the compressed suffix tree very recently proposed by Sadakane (2007). The compressed suffix tree occupies space proportional to the text size, that is, O ( n log |Σ|) bits, and supports all typical suffix tree operations with at most log n factor slowdown. Our experiments show that, for example, on a 10 MB DNA sequence, the compressed suffix tree takes 10% of the space of the normal suffix tree. At the same time, a representative algorithm is slowed down by factor 30. Our implementation follows the original proposal in spirit, but some internal parts are tailored toward practical implementation. Our construction algorithm has time requirement O ( n log n log |Σ|) and uses closely the same space as the final structure while constructing it: on the 10MB DNA sequence, the maximum space usage during construction is only 1.5 times the final product size. As by-products, we develop a method to create Succinct Suffix Array directly from Burrows-Wheeler transform and a space-efficient version of the suffixes-insertion algorithm to build balanced parentheses representation of suffix tree from LCP information.
Abstract. In the past few years, the focus of research in the area of statistical data privacy has been in designing algorithms for various problems which satisfy some rigorous notions of privacy. However, not much effort has gone into designing techniques to computationally verify if a given algorithm satisfies some predefined notion of privacy. In this work, we address the following question: Can we design algorithms which tests if a given algorithm satisfies some specific rigorous notion of privacy (e.g., differential privacy)?We design algorithms to test privacy guarantees of a given algorithm A when run on a dataset x containing potentially sensitive information about the individuals. More formally, we design a computationally efficient algorithm Tpriv that verifies whether A satisfies differential privacy on typical datasets (DPTD) guarantee in time sublinear in the size of the domain of the datasets. DPTD, a similar notion to generalized differential privacy first proposed by [3], is a distributional relaxation of the popular notion of differential privacy [14].To design algorithm Tpriv, we show a formal connection between the testing of privacy guarantee for an algorithm and the testing of the Lipschitz property of a related function. More specifically, we show that an efficient algorithm for testing of Lipschitz property can be used as a subroutine in Tpriv that tests if an algorithm satisfies differential privacy on typical datasets.Apart from formalizing the connection between the testing of privacy guarantee and testing of the Lipschitz property, we generalize the work of [21] to the setting of property testing under product distribution. More precisely, we design an efficient Lipschitz tester for the case where the domain points are drawn from hypercube according to some fixed but unknown product distribution instead of the uniform distribution.
Suffix tree is one of the most important data structures in string algorithms and biological sequence analysis. Unfortunately, when it comes to implementing those algorithms and applying them to real genomic sequences, often the main memory size becomes the bottleneck. This is easily explained by the fact that while a DNA sequence of length n from alphabet Σ = {A, C, G, T } can be stored in n log |Σ| = 2n bits, its suffix tree occupies O(n log n) bits. In practice, the size difference easily reaches factor 50.We report on an implementation of the compressed suffix tree very recently proposed by Sadakane (Theory of Computing Systems, in press). The compressed suffix tree occupies space proportional to the text size, i.e. O(n log |Σ|) bits, and supports all typical suffix tree operations with at most log n factor slowdown. Our experiments show that, e.g. on a 10 MB DNA sequence, the compressed suffix tree takes 10% of the space of the normal suffix tree. At the same time, a representative algorithm is slowed down by factor 30.Our implementation follows the original proposal in spirit, but some internal parts are tailored towards practical implementation. Our construction algorithm has time requirement O(n log n log |Σ|) and uses closely the same space as the final structure while constructing it: on the 10 MB DNA sequence, the maximum space usage during construction is only 1.5 times the final product size. As by-products, we develop a method to create Succinct Suffix Array directly from Burrows-Wheeler transform and a space-efficient version of suffixes-insertion algorithm to build balanced parentheses representation of suffix tree from LCP information.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.