Alaa Maalouf scite author profile

A coreset (or core-set) of an input set is its small summation, such that solving a problem on the coreset as its input, provably yields the same result as solving the same problem on the original (full) set, for a given family of problems (models, classifiers, loss functions). Over the past decade, coreset construction algorithms have been suggested for many fundamental problems in e.g. machine/deep learning, computer vision, graphics, databases, and theoretical computer science. This introductory paper was written following requests from (usually non-expert, but also colleagues) regarding the many inconsistent coreset definitions, lack of available source code, the required deep theoretical background from different fields, and the dense papers that make it hard for beginners to apply coresets and develop new ones.The paper provides folklore, classic and simple results including step-by-step proofs and figures, for the simplest (accurate) coresets of very basic problems, such as: sum of vectors, minimum enclosing ball, SVD/ PCA and linear regression. Nevertheless, we did not find most of their constructions in the literature. Moreover, we expect that putting them together in a retrospective context would help the reader to grasp modern results that usually extend and generalize these fundamental observations. Experts might appreciate the unified notation and comparison table that links between existing results. Open source code with example scripts are provided for all the presented algorithms, to demonstrate their practical usage, and to support the readers who are more familiar with programming than math.

show abstract

Provably Approximated Point Cloud Registration

Jubran

Maalouf

Kimmel

et al. 2021

View full text Add to dashboard Cite

Tight Sensitivity Bounds For Smaller Coresets

Maalouf

Statman

Feldman

2019

Preprint

View full text Add to dashboard Cite

An ε-coreset for Least-Mean-Squares (LMS) of a matrix A ∈ R n×d is a small weighted subset of its rows that approximates the sum of squared distances from its rows to every affine k-dimensional subspace of R d , up to a factor of 1±ε. Such coresets are useful for hyper-parameter tuning and solving many least-mean-squares problems such as low-rank approximation (k-SVD), k-PCA, Lassso/Ridge/Linear regression and many more. Coresets are also useful for handling streaming, dynamic and distributed big data in parallel. With high probability, non-uniform sampling based on upper bounds on what is known as importance or sensitivity of each row in A yields a coreset. The size of the (sampled) coreset is then near-linear in the total sum of these sensitivity bounds.We provide algorithms that compute provably tight bounds for the sensitivity of each input row.It is based on two ingredients: (i) iterative algorithm that computes the exact sensitivity of each point up to arbitrary small precision for (non-affine) k-subspaces, and (ii) a general reduction of independent interest from computing sensitivity for the family of affine k-subspaces in R d to (non-affine) (k + 1)-subspaces in R d+1 .Experimental results on real-world datasets, including the English Wikipedia documentsterm matrix, show that our bounds provide significantly smaller and data-dependent coresets also in practice. Full open source is also provided.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Alaa Maalouf

Fast and Accurate Least-Mean-Squares Solvers for High Dimensional Data

Tight Sensitivity Bounds For Smaller Coresets

Introduction to Coresets: Accurate Coresets

Provably Approximated Point Cloud Registration

Tight Sensitivity Bounds For Smaller Coresets

Contact Info

Product

Resources

About