Xiaoge Deng scite author profile

et al. 2020

IEEE Access

Subspace clustering has been widely applied to detect meaningful clusters in high-dimensional data spaces. And the sparse subspace clustering (SSC) obtains superior clustering performance by solving a relaxed 0-minimization problem with 1-norm. Although the use of 1-norm instead of the 0 one can make the object function convex, it causes large errors on large coefficients in some cases. In this paper, we study the sparse subspace clustering algorithm based on a nonconvex modeling formulation. Specifically, we introduce a nonconvex pseudo-norm that makes a better approximation to the 0-minimization than the traditional 1-minimization framework and consequently finds a better affinity matrix. However, this formulation makes the optimization task challenging due to that the traditional alternating direction method of multipliers (ADMM) encounters troubles in solving the nonconvex subproblems. In view of this, the reweighted techniques are employed in making these subproblems convex and easily solvable. We provide several guarantees to derive the convergence results, which proves that the nonconvex algorithm is globally convergent to a critical point. Experiments on two real-world problems of motion segmentation and face clustering show that our method outperforms state-of-the-art techniques. INDEX TERMS Sparse subspace clustering, nonconvex approximation, ADMM, reweighted algorithms.

show abstract

PRIAG: Proximal Reweighted Incremental Aggregated Gradient Algorithm for Distributed Optimizations

Liu

et al. 2020

SignGD with error feedback meets lazily aggregated technique: Communication-efficient algorithms for distributed learning

Liu

et al. 2022

Tsinghua Sci. Technol.

The proliferation of massive datasets has led to significant interests in distributed algorithms for solving large-scale machine learning problems. However, the communication overhead is a major bottleneck that hampers the scalability of distributed machine learning systems. In this paper, we design two communication-efficient algorithms for distributed learning tasks. The first one is named EF-SIGNGD, in which we use the 1-bit (sign-based) gradient quantization method to save the communication bits. Moreover, the error feedback technique, i.e., incorporating the error made by the compression operator into the next step, is employed for the convergence guarantee. The second algorithm is called LE-SIGNGD, in which we introduce a well-designed lazy gradient aggregation rule to EF-SIGNGD that can detect the gradients with small changes and reuse the outdated information. LE-SIGNGD saves communication costs both in transmitted bits and communication rounds. Furthermore, we show that LE-SIGNGD is convergent under some mild assumptions. The effectiveness of the two proposed algorithms is demonstrated through experiments on both real and synthetic data.

show abstract

Stability-Based Generalization Analysis of the Asynchronous Decentralized SGD

et al. 2023

AAAI

The generalization ability often determines the success of machine learning algorithms in practice. Therefore, it is of great theoretical and practical importance to understand and bound the generalization error of machine learning algorithms. In this paper, we provide the first generalization results of the popular stochastic gradient descent (SGD) algorithm in the distributed asynchronous decentralized setting. Our analysis is based on the uniform stability tool, where stable means that the learned model does not change much in small variations of the training set. Under some mild assumptions, we perform a comprehensive generalizability analysis of the asynchronous decentralized SGD, including generalization error and excess generalization error bounds for the strongly convex, convex, and non-convex cases. Our theoretical results reveal the effects of the learning rate, training data size, training iterations, decentralized communication topology, and asynchronous delay on the generalization performance of the asynchronous decentralized SGD. We also study the optimization error regarding the objective function values and investigate how the initial point affects the excess generalization error. Finally, we conduct extensive experiments on MNIST, CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets to validate the theoretical findings.

show abstract

CASQ: Accelerate Distributed Deep Learning with Sketch-Based Gradient Quantization

Zhang

et al. 2021