2019
DOI: 10.1007/s10589-019-00140-7
|View full text |Cite
|
Sign up to set email alerts
|

Markov chain block coordinate descent

Abstract: The method of block coordinate gradient descent (BCD) has been a powerful method for largescale optimization. This paper considers the BCD method that successively updates a series of blocks selected according to a Markov chain. This kind of block selection is neither i.i.d. random nor cyclic. On the other hand, it is a natural choice for some applications in distributed optimization and Markov decision process, where i.i.d. random and cyclic selections are either infeasible or very expensive. By applying mixi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
84
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 34 publications
(85 citation statements)
references
References 26 publications
0
84
1
Order By: Relevance
“…In addition, it would be interesting to see whether the techniques developed herein can be exploited towards understanding model-free algorithms with more sophisticated exploration schemes [64]. Finally, asynchronous Q-learning on a single Markovian trajectory is closely related to coordinate descent with coordinates selected according to a Markov chain; one would naturally ask whether our analysis framework can yield improved convergence guarantees for general Markov-chain-based optimization algorithms [65], [66].…”
Section: Discussionmentioning
confidence: 99%
“…In addition, it would be interesting to see whether the techniques developed herein can be exploited towards understanding model-free algorithms with more sophisticated exploration schemes [64]. Finally, asynchronous Q-learning on a single Markovian trajectory is closely related to coordinate descent with coordinates selected according to a Markov chain; one would naturally ask whether our analysis framework can yield improved convergence guarantees for general Markov-chain-based optimization algorithms [65], [66].…”
Section: Discussionmentioning
confidence: 99%
“…assumption on data samples. We obtain global convergence to stationary points of rate O((log n) 1+ε /n 1/2 ), matching the optimal convergence rates for SGD based methods [SSY18,DD19]. Interestingly, our analysis shows that SBMM (and hence SMM) is more adapted to solve empirical loss minimization than expected loss minimization, in the sense that the aforementioned rate of convergence holds for the empirical loss functions almost surely and in expectation for the expected loss function; an almost sure convergence for the empirical loss function is obtained at a slower rate of O((log n) 1+ε /n 1/4 ).…”
Section: Introductionmentioning
confidence: 52%
“…Assumption (A4) states that the sequence of weights w n ∈ (0, 1] we use to recursively define the empirical loss (1) and surrogate loss (7) does not decay too fast so that ∞ n=1 w n = ∞ but decay fast enough so that ∞ n=1 w 2 n < ∞. This is analogous to requirements for stepsizes in stochastic gradient descent algorithms, where the stepsizes are usually required to be non-summable but square-summable (see, e.g., [SSY18]). Note that our general results do not require the stronger assumption ∞ n=1 w 2 n n < ∞, which is standard in the literature [MBPS10, Mai13b, MMTV17, LNB20, LSN20].…”
Section: (A6)mentioning
confidence: 99%
See 2 more Smart Citations