Feature Article: Optimization for simulation: Theory vs. Practice

Fu, Michael C.

doi:10.1287/ijoc.14.3.192.113

Cited by 699 publications

(424 citation statements)

References 36 publications

Supporting

Mentioning

405

Contrasting

Unclassified

Order By: Relevance

“…The first simplification is the usage of the same random number generator for exploration across all seven motor primitives. This simplification is well known to introduce similar exploration over all degrees of freedom and has been shown to reduce the variance in the gradient estimate (Fu, 2002). It is necessary, as otherwise the exploration noise added in one DOF will often "fight" the exploration noise of other DOFs, resulting in very slow learning.…”

Section: Robot Application: Motor Primitive Learning For Baseballmentioning

confidence: 99%

Reinforcement learning of motor skills with policy gradients

2008

View full text Add to dashboard Cite

a b s t r a c tAutonomous learning is one of the hallmarks of human and animal behavior, and understanding the principles of learning will be crucial in order to achieve true autonomy in advanced machines like humanoid robots. In this paper, we examine learning of complex motor skills with human-like limbs. While supervised learning can offer useful tools for bootstrapping behavior, e.g., by learning from demonstration, it is only reinforcement learning that offers a general approach to the final trial-and-error improvement that is needed by each individual acquiring a skill. Neither neurobiological nor machine learning studies have, so far, offered compelling results on how reinforcement learning can be scaled to the high-dimensional continuous state and action spaces of humans or humanoids. Here, we combine two recent research developments on learning motor control in order to achieve this scaling. First, we interpret the idea of modular motor control by means of motor primitives as a suitable way to generate parameterized control policies for reinforcement learning. Second, we combine motor primitives with the theory of stochastic policy gradient learning, which currently seems to be the only feasible framework for reinforcement learning for humanoids. We evaluate different policy gradient methods with a focus on their applicability to parameterized motor primitives. We compare these algorithms in the context of motor primitive learning, and show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.

show abstract

Section: Robot Application: Motor Primitive Learning For Baseballmentioning

confidence: 99%

Reinforcement learning of motor skills with policy gradients

2008

View full text Add to dashboard Cite

show abstract

“…The M-LPM n efficient frontier is the solution to expression (12) for different (a, µ). If in (12), instead of X ∈ Z we have X ∈ Z 0 , the efficient frontier is generated by only the risky assets. Different from the M-V frontier, the M-LPM n efficient frontier changes for different values of the pair (a, µ).…”

Section: The Portfolio Optimization Problemmentioning

confidence: 99%

“…This corresponds to the solution to problem (12) which contains preferences exhibiting both risk and loss aversion (as in expression (6)). The optimal solution to (12) lies between the M-V and M-LPM optimal portfolios. To illustrate the computations, we use LP M 1 as the downside risk measure.…”

Section: M-v Versus M-lpm Comparisonmentioning

confidence: 99%

“…less skewed return distributions, at low return levels the optimal portfolios have distributions very close to lognormal, and therefore the difference between the M-V and M-LPM optimal portfolios are small or zero if the M-V portfolios have zero LPM. 12 Because the exponential function skews returns to the right, we make the magnitude of the largest positive jump smaller than the largest negative jump. In contrast, Liu, Longstaff and Pan (2003) used symmetric jump sizes.…”

Section: Jump-diffusion Processes For Stock Pricesmentioning

confidence: 99%

See 1 more Smart Citation

Downside Loss Aversion and Portfolio Management

Jarrow

Zhao

2005

SSRN Journal

View full text Add to dashboard Cite

Downside loss averse preferences have seen a resurgence in the portfolio management literature. This is due to the increasing usage of derivatives in managing equity portfolios, and the increased usage of quantitative techniques for bond portfolio management. We employ the lower partial moment as a risk measure for downside loss aversion, and compare mean-variance (M-V) and mean-lower partial moment (M-LPM) optimal portfolios under non-normal asset return distributions. When asset returns are nearly normally distributed, there is little difference between the optimal M-V and M-LPM portfolios. When asset returns are non-normal with large left tails, we document significant differences in M-V and M-LPM optimal portfolios. This observation is consistent with industry usage of M-V theory for equity portfolios, but not for fixed income portfolios.

show abstract

“…One of the great difficulties in approaching a problem through optimization techniques based on simulation (simulation based optimization) lies in the development of an efficient algorithm (Fu, 2002). As we will subsequently see, the complexity of the developed algorithm is exponential.…”

Section: Computational Modelmentioning

confidence: 99%

New product development projects evaluation under time uncertainty

Silva

Santiago

2009

Pesqui. Oper.

View full text Add to dashboard Cite

The development time is one of the key factors that contribute to the new product development success. In spite of that, the impact of the time uncertainty on the development has been not fully exploited, as far as decision supporting models to evaluate this kind of projects is concerned. In this context, the objective of the present paper is to evaluate the development process of new technologies under time uncertainty. We introduce a model which captures this source of uncertainty and develop an algorithm to evaluate projects that incorporates Monte Carlo Simulation and Dynamic Programming. The novelty in our approach is to thoroughly blend the stochastic time with a formal approach to the problem, which preserves the Markov property. We base our model on the distinction between the decision epoch and the stochastic time. We discuss and illustrate the applicability of our model through an empirical example.Keywords: decision under uncertainty; dynamic programming; Monte Carlo simulation; project management; R&D projects. ResumoO tempo de desenvolvimento é um dos fatores-chave que contribuem para o sucesso do desenvolvimento de novos produtos. Apesar disso, o impacto da incerteza de tempo no desenvolvimento tem sido pouco considerado em modelos de avaliação e valoração deste tipo de projetos. Neste contexto, este trabalho tem como objetivo avaliar projetos de desenvolvimento de novas tecnologias mediante o tempo incerto. Introduzimos um modelo capaz de captar esta fonte de incerteza e desenvolvemos um algoritmo para a valoração do projeto que integra Simulação de Monte Carlo e Programação Dinâmica. A novidade neste trabalho é conseguir integrar meticulosamente o tempo estocástico a uma estrutura formal para tomada de decisão que preserva a propriedade de Markov. O principal ponto para viabilizar este fato é distinção entre o momento de revisão e o tempo estocástico. Ilustramos e discutimos a aplicabilidade deste modelo por meio de um exemplo empírico.Palavras-chave: decisão sob incerteza; programação dinâmica; simulação de Monte Carlo; gerenciamento de projetos; projetos de P&D.

show abstract

Feature Article: Optimization for simulation: Theory vs. Practice

Cited by 699 publications

References 36 publications

Reinforcement learning of motor skills with policy gradients

Reinforcement learning of motor skills with policy gradients

Downside Loss Aversion and Portfolio Management

New product development projects evaluation under time uncertainty

Contact Info

Product

Resources

About