In this paper we develop a theory of matrix completion for the extreme case of noisy 1-bit observations. Instead of observing a subset of the real-valued entries of a matrix M , we obtain a small number of binary (1-bit) measurements generated according to a probability distribution determined by the realvalued entries of M . The central question we ask is whether or not it is possible to obtain an accurate estimate of M from this data. In general this would seem impossible, but we show that the maximum likelihood estimate under a suitable constraint returns an accurate estimate of M when M ∞ ≤ α and rank(M ) ≤ r. If the log-likelihood is a concave function (e.g., the logistic or probit observation models), then we can obtain this maximum likelihood estimate by optimizing a convex program. In addition, we also show that if instead of recovering M we simply wish to obtain an estimate of the distribution generating the 1-bit measurements, then we can eliminate the requirement that M ∞ ≤ α. For both cases, we provide lower bounds showing that these estimates are near-optimal. We conclude with a suite of experiments that both verify the implications of our theorems as well as illustrate some of the practical applications of 1-bit matrix completion. In particular, we compare our program to standard matrix completion methods on movie rating data in which users submit ratings from 1 to 5. In order to use our program, we quantize this data to a single bit, but we allow the standard matrix completion program to have access to the original ratings (from 1 to 5). Surprisingly, the approach based on binary data performs significantly better.
We study the performance of Reed-Solomon (RS) codes for the exact repair problem in distributed storage. Our main result is that, in some parameter regimes, Reed-Solomon codes are optimal regenerating codes, among MDS codes with linear repair schemes. Moreover, we give a characterization of MDS codes with linear repair schemes which holds in any parameter regime, and which can be used to give non-trivial repair schemes for RS codes in other settings.More precisely, we show that for k-dimensional RS codes whose evaluation points are a finite field of size n, there are exact repair schemes with bandwidth (n − 1) log((n − 1)/(n − k)) bits, and that this is optimal for any MDS code with a linear repair scheme. In contrast, the naive (commonly implemented) repair algorithm for this RS code has bandwidth k log(n) bits. When the entire field is used as evaluation points, the number of nodes n is much larger than the number of bits per node (which is O(log(n))), and so this result holds only when the degree of sub-packetization is small. However, our method applies in any parameter regime, and to illustrate this for high levels of sub-packetization we give an improved repair scheme for a specific (14,10)-RS code used in the Facebook Hadoop Analytics cluster.
Machine learning relies on the assumption that unseen test instances of a classification problem follow the same distribution as observed training data. However, this principle can break down when machine learning is used to make important decisions about the welfare (employment, education, health) of strategic individuals. Knowing information about the classifier, such individuals may manipulate their attributes in order to obtain a better classification outcome. As a result of this behavior-often referred to as gaming-the
Binary measurements arise naturally in a variety of statistical and engineering applications. They may be inherent to the problem-e.g., in determining the relationship between genetics and the presence or absence of a disease-or they may be a result of extreme quantization. A recent influx of literature has suggested that using prior signal information can greatly improve the ability to reconstruct a signal from binary measurements. This is exemplified by onebit compressed sensing, which takes the compressed sensing model but assumes that only the sign of each measurement is retained. It has recently been shown that the number of one-bit measurements required for signal estimation mirrors that of unquantized compressed sensing. Indeed, s-sparse signals in R n can be estimated (up to normalization) from Ω(s log(n/s)) one-bit measurements. Nevertheless, controlling the precise accuracy of the error estimate remains an open challenge. In this paper, we focus on optimizing the decay of the error as a function of the oversampling factor λ := m/(s log(n/s)), where m is the number of measurements. It is known that the error in reconstructing sparse signals from standard one-bit measurements is bounded below by Ω(λ −1 ). Without adjusting the measurement procedure, reducing this polynomial error decay rate is impossible. However, we show that an adaptive choice of the thresholds used for quantization may lower the error rate to e −Ω(λ) . This improves upon guarantees for other methods of adaptive thresholding as proposed in Sigma-Delta quantization. We develop * Authors are listed in alphabetical order. a general recursive strategy to achieve this exponential decay and two specific polynomialtime algorithms which fall into this framework, one based on convex programming and one on hard thresholding. This work is inspired by the one-bit compressed sensing model, in which the engineer controls the measurement procedure. Nevertheless, the principle is extendable to signal reconstruction problems in a variety of binary statistical models as well as statistical estimation problems like logistic regression.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.