Whether or not the problem of finding maximal independent sets (MIS) in hypergraphs is in (R)NC is one of the fundamental problems in the theory of parallel computing. Unlike the well-understood case of MIS in graphs, for the hypergraph problem, our knowledge is quite limited despite considerable work. It is known that the problem is in RNC when the edges of the hypergraph have constant size. For general hypergraphs with n vertices and m edges, the fastest previously known algorithm works in time O( √ n) with poly(m, n) processors. In this paper we give an EREW PRAM algorithm that works in time n o(1) with poly(m, n)processors on general hypergraphs satisfying m ≤ n log (2) n 8(log (3) n) 2 , where log (2) n = log log n and log (3) n = log log log n. Our algorithm is based on a sampling idea that reduces the dimension of the hypergraph and employs the algorithm for constant dimension hypergraphs as a subroutine.
Whether or not the problem of finding maximal independent sets (MIS) in hypergraphs is in (R)NC is one of the fundamental problems in the theory of parallel computing. Unlike the well-understood case of MIS in graphs, for the hypergraph problem, our knowledge is quite limited despite considerable work. It is known that the problem is in RNC when the edges of the hypergraph have constant size. For general hypergraphs with n vertices and m edges, the fastest previously known algorithm works in time O( √ n) with poly(m, n) processors. In this paper we give an EREW PRAM algorithm that works in time n o(1) with poly(m, n)processors on general hypergraphs satisfying m ≤ n log (2) n 8(log (3) n) 2 , where log (2) n = log log n and log (3) n = log log log n. Our algorithm is based on a sampling idea that reduces the dimension of the hypergraph and employs the algorithm for constant dimension hypergraphs as a subroutine.
Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters -especially if the data is already biased.At NIPS 2017, Chierichetti et al. [14] proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation for the fair k-center problem and a O(t)-approximation for the fair k-median problem, where t is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation for fair k-center [34].We extend and improve the known results. Firstly, we give a 5-approximation for the fair k-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximations for all of the classical clustering objectives kcenter, k-supplier, k-median, k-means and facility location. The latter approximations are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where the centers are already fixed.
No abstract
Filter data structures have been used ubiquitously since the 1970s to answer approximate set-membership queries in various areas of computer science including architecture, networks, operating systems, and databases. Such filters need to be allocated with a given capacity in advance to provide a guarantee over the false positive rate. In many applications, however, the data size is not known in advance, requiring filters to dynamically expand. This paper shows that existing methods for expanding filters exhibit at least one of the following flaws: (1) they entail an expensive scan over the whole data set, (2) they require a lavish memory footprint, (3) their query, delete and/or insertion performance plummets, (4) their false positive rate skyrockets, and/or (5)~they cannot expand indefinitely. We introduce InfiniFilter, a new method for expanding filters that addresses these shortcomings. InfiniFilter is a hash table that stores a fingerprint for each entry. It doubles in size when it reaches capacity, and it sacrifices one bit from each fingerprint to map it to the expanded hash table. The core novelty is a new and flexible hash slot format that sets longer fingerprints to newer entries. This keeps the average fingerprint length long and thus the false positive rate stable. At the same time, InfiniFilter provides stable insertion/query/delete performance as it is comprised of a unified hash table. We implement InfiniFilter on top of Quotient Filter, and we demonstrate theoretically and empirically that it offers superior cost properties compared to existing methods: it better scales performance, the false positive rate, and the memory footprint, all at the same time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.