2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids) 2015
DOI: 10.1109/humanoids.2015.7363529
|View full text |Cite
|
Sign up to set email alerts
|

Regularized covariance estimation for weighted maximum likelihood policy search methods

Abstract: Abstract-Many episode-based (or direct) policy search algorithms, maintain a multivariate Gaussian distribution as search distribution over the parameter space of some objective function. One class of algorithms, such as episodic REPS, PoWER or PI 2 uses, a weighted maximum likelihood estimate (WMLE) to update the mean and covariance matrix of this distribution in each iteration. However, due to high dimensionality of covariance matrices and limited number of samples, the WMLE is an unreliable estimator. The u… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2016
2016
2019
2019

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 13 publications
(13 citation statements)
references
References 12 publications
0
13
0
Order By: Relevance
“…While this algorithm can improve the search distribution, it can quickly result in a degenerated distribution which stops exploration and would lead to premature convergence, which is a problem in many of these methods [12,14]. The cause of this limitation is mainly the maximum likelihood estimate (Equation 10) which overfits the current individuals and change the current search distribution drastically [2].…”
Section: Stochastic Search By Expectation-maximisationmentioning
confidence: 99%
See 3 more Smart Citations
“…While this algorithm can improve the search distribution, it can quickly result in a degenerated distribution which stops exploration and would lead to premature convergence, which is a problem in many of these methods [12,14]. The cause of this limitation is mainly the maximum likelihood estimate (Equation 10) which overfits the current individuals and change the current search distribution drastically [2].…”
Section: Stochastic Search By Expectation-maximisationmentioning
confidence: 99%
“…2 We 2 In this paper σ is used to denote the variance which we refer to it as step size. seek to find a distribution over individuals x, denoted π (x; θ ), that minimises the expected fitness…”
Section: Preliminariesmentioning
confidence: 99%
See 2 more Smart Citations
“…In addition, REPS and RBF-REPS can suffer from premature convergence. In [12] Covariance Estimation with Controlled Entropy Reduction (CECER) algorithm was introduced to alleviate the premature convergence problem of REPS. Our new algorithm local CECER leverage from both nonlinear generalization over contexts and fully context dependent search distribution update rule while it also uses CECER algorithm concept [12] to avoid premature convergence.…”
Section: B Related Workmentioning
confidence: 99%