Dmitrii Avdiukhin scite author profile

Dmitrii Avdiukhin

5Publications

16Citation Statements Received

168Citation Statements Given

How they've been cited

How they cite others

134

161

Affiliations

Indiana University Bloomington, Indiana University

Publications

Order By: Most citations

Adversarially Robust Submodular Maximization under Knapsack Constraints

Avdiukhin

Mitrović

Yaroslavtsev

et al. 2019

View full text Add to dashboard Cite

We propose the first adversarially robust algorithm for monotone submodular maximization under single and multiple knapsack constraints with scalable implementations in distributed and streaming settings. For a single knapsack constraint, our algorithm outputs a robust summary of almost optimal (up to polylogarithmic factors) size, from which a constant-factor approximation to the optimal solution can be constructed. For multiple knapsack constraints, our approximation is within a constant-factor of the best known non-robust solution.We evaluate the performance of our algorithms by comparison to natural robustifications of existing non-robust algorithms under two objectives: 1) dominating set for large social network graphs from Facebook and Twitter collected by the Stanford Network Analysis Project (SNAP), 2) movie recommendations on a dataset from MovieLens. Experimental results show that our algorithms give the best objective for a majority of the inputs and show strong performance even compared to offline algorithms that are given the set of removals in advance. OPT (V ) = argmaxSince the constraints are scaling-invariant, one can rescale each row C i by multiplying it (and the corresponding entry in b) by b 1 /b i so that all entries in b are the same and equal to b 1 . One can further rescale C and b by the smallest entry in C (or some lower bound on it), so that min i,j C i,j ≥ 1. We assume such rescaling below and let K = b i for all i. In the case of one constraint (d = 1), we further simplify the notation and set c(e i ) = C 1,i and K = b 1 and refer to c(e i ) simply as the cost of the i-th item.An important role in our algorithms is played by the marginal density of an item. Formally, for a set S ⊆ V , an element e and a cost function c : V → R ≥0 we define the marginal density of e

show abstract

Multi-dimensional balanced graph partitioning via projected gradient descent

2019

View full text Add to dashboard Cite

Motivated by performance optimization of large-scale graph processing systems that distribute the graph across multiple machines, we consider the balanced graph partitioning problem. Compared to most of the previous work, we study the multi-dimensional variant when balance according to multiple weight functions is required. As we demonstrate by experimental evaluation, such multi-dimensional balance is essential for achieving performance improvements for typical distributed graph processing workloads.We propose a new scalable technique for the multidimensional balanced graph partitioning problem. The method is based on applying randomized projected gradient descent to a non-convex continuous relaxation of the objective. We show how to implement the new algorithm efficiently in both theory and practice utilizing various approaches for the projection step. Experiments with large-scale graphs with up to 800B edges indicate that our algorithm has superior performance compared with the state-of-the-art approaches.

show abstract

Objective-Based Hierarchical Clustering of Deep Embedding Vectors

Naumov¹,

Yaroslavtsev²,

Avdiukhin³

2020

Preprint

View full text Add to dashboard Cite

We initiate a comprehensive experimental study of objective-based hierarchical clustering methods on massive datasets consisting of deep embedding vectors from computer vision and NLP applications. This includes a large variety of image embedding (ImageNet, ImageNetV2, NaBirds), word embedding (Twitter, Wikipedia), and sentence embedding (SST-2) vectors from several popular recent models (e.g. ResNet, ResNext, Inception V3, SBERT). Our study includes datasets with up to 4.5 million entries with embedding dimensions up to 2048.In order to address the challenge of scaling up hierarchical clustering to such large datasets we propose a new practical hierarchical clustering algorithm B++&C. It gives a 5%/20% improvement on average for the popular Moseley-Wang (MW) / Cohen-Addad et al. (CKMM) objectives (normalized) compared to a wide range of classic methods and recent heuristics. We also introduce a theoretical algorithm B2SAT&C which achieves a 0.74-approximation for the CKMM objective in polynomial time. This is the first substantial improvement over the trivial 2/3-approximation achieved by a random binary tree. Prior to this work, the best poly-time approximation of ≈ 2/3 + 0.0004 was due to Charikar et al. (SODA'19).

show abstract

"Bring Your Own Greedy"+Max: Near-Optimal $1/2$-Approximations for Submodular Knapsack

Avdiukhin¹,

Yaroslavtsev²,

Zhou³

2019

Preprint

View full text Add to dashboard Cite

The problem of selecting a small-size representative summary of a large dataset is a cornerstone of machine learning, optimization and data science. Motivated by applications to recommendation systems and other scenarios with query-limited access to vast amounts of data, we propose a new rigorous algorithmic framework for a standard formulation of this problem as a submodular maximization subject to a linear (knapsack) constraint. Our framework is based on augmenting all partial Greedy solutions with the best additional item. It can be instantiated with negligible overhead in any model of computation, which allows the classic Greedy algorithm and its variants to be implemented. We give such instantiations in the offline (Greedy+Max), multi-pass streaming (Sieve+Max) and distributed (Distributed Sieve+Max) settings. Our algorithms give ( 1 /2 − )-approximation with most other key parameters of interest being near-optimal. Our analysis is based on a new set of first-order linear differential inequalities and their robust approximate versions. Experiments on typical datasets (movie recommendations, influence maximization) confirm scalability and high quality of solutions obtained via our framework. Instance-specific approximations are typically in the 0.6-0.7 range and frequently beat even the (1 − 1/e) ≈ 0.63 worst-case barrier for polynomial-time algorithms.

show abstract

Multi-Dimensional Balanced Graph Partitioning via Projected Gradient Descent

Avdiukhin¹,

Pupyrev²,

Yaroslavtsev³

2019

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.