Warren Schudy scite author profile

Correlation clustering is a type of clustering that uses a basic form of input data: For every pair of data items, the input specifies whether they are similar (belonging to the same cluster) or dissimilar (belonging to different clusters). This information may be inconsistent, and the goal is to find a clustering (partition of the vertices) that disagrees with as few pieces of information as possible.Correlation clustering is APX-hard for worst-case inputs. We study the following semi-random noisy model to generate the input: start from an arbitrary partition of the vertices into clusters. Then, for each pair of vertices, the similarity information is corrupted (noisy) independently with probability p. Finally, an adversary generates the input by choosing similarity/dissimilarity information arbitrarily for each corrupted pair of vertices.In this model, our algorithm produces a clustering with cost at most 1 + O(n −1/6 ) times the cost of the optimal clustering, as long as p ≤ 1/2 − n −1/3 . Moreover, if all clusters have size at least 1 c1 √ n then we can exactly reconstruct the planted clustering. If the noise p is small, that is, p ≤ n −δ /60, then we can exactly reconstruct all clusters of the planted clustering that have size at least 3150/δ, and provide a certificate (witness) proving that those clusters are in any optimal clustering.Among other techniques, we use the natural semidefinite programming relaxation followed by an interesting rounding phase. The analysis uses SDP duality and spectral properties of random matrices.

show abstract

Concentration and Moment Inequalities for Polynomials of Independent Random Variables

Schudy¹,

Sviridenko²

2012

View full text Add to dashboard Cite

Polynomials of independent random variables arise in a variety of fields such as Machine Learning, Analysis of Boolean Functions, Additive Combinatorics, Random Graphs Theory, Stochastic Partial Differential Equations etc. They naturally model the expected value of objective function (or lefthand side of constraints) for randomized rounding algorithms for non-linear optimization problems where one finds a solution of an "easy" continuous problem and rounds it to a solution of a "hard" integral problem (one such example is Convex Integer Programming [6]). To measure the performance guarantee of such algorithms one needs analogously to the analysis employed by Raghavan and Thompson [17] for boolean integer programming problems an analog of Chernoff Bounds for polynomials of independent random variables. There are many known forms and variations of Chernoff Bounds. One of the tightest ones is based on a variance of a sum of random variables known as Bernstein inequality. Another popular albeit a weaker version is using an estimate of a variance through the expectation. The later versions of concentration inequalities for polynomials of independent random variables are known [12,18]. In this paper we derive an analog of Bernstein Inequality for multilinear polynomials of independent random variables.We show that the probability that a multilinear polynomial f of independent random variables exceeds its mean by λ is at most e −λ 2 /(R q V ar(f )) for sufficiently small λ, where R is an absolute constant. This matches (up to constants in the exponent) what one would expect from the central limit theorem. Our methods handle a variety of types of random variables including Gaussian, Boolean, exponential, and Poisson. Previous work by Kim-Vu and Schudy-Sviridenko gave bounds of the same form that involved less natural parameters in place of the variance.

show abstract

Massively Parallel Computation via Remote Memory Access

Behnezhad

Dhulipala

Esfandiari

et al. 2019

View full text Add to dashboard Cite

We introduce the Adaptive Massively Parallel Computation (AMPC) model, which is an extension of the Massively Parallel Computation (MPC) model. At a high level, the AMPC model strengthens the MPC model by storing all messages sent within a round in a distributed data store. In the following round, all machines are provided with random read access to the data store, subject to the same constraints on the total amount of communication as in the MPC model. Our model is inspired by the previous empirical studies of distributed graph algorithms [28,9] using MapReduce and a distributed hash table service [17].This extension allows us to give new graph algorithms with much lower round complexities compared to the best known solutions in the MPC model. In particular, in the AMPC model we show how to solve maximal independent set in O(1) rounds and connectivity/minimum spanning tree in O(log log m/n n) rounds both using O(n δ ) space per machine for constant δ < 1. In the same memory regime for MPC, the best known algorithms for these problems require poly log n rounds. Our results imply that the 2-Cycle conjecture, which is widely believed to hold in the MPC model, does not hold in the AMPC model.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Warren Schudy

How to rank with few errors

Correlation Clustering with Noisy Input

Concentration and Moment Inequalities for Polynomials of Independent Random Variables

Massively Parallel Computation via Remote Memory Access

Contact Info

Product

Resources

About