Abstract-Novel anthropomorphic robotic systems increasingly employ variable impedance actuation in order to achieve robustness to uncertainty, superior agility and efficiency that are hallmarks of biological systems. Controlling and modulating impedance profiles such that it is optimally tuned to the controlled plant is crucial to realise these benefits. In this work, we propose a methodology to generate optimal control commands for variable impedance actuators under a prescribed tradeoff of task accuracy and energy cost. In contrast to classical optimal control methods that typically require an accurate analytical plant dynamics model, we employ a supervised learning paradigm to acquire both the process dynamics as well as the stochastic properties. This enables us to prescribe an optimal impedance and command profile (i) tuned to the hardto-model stochastic characteristics of a plant and (ii) adapt to the systematic changes such as a change in load.