“…G2P is a hierarchical autonomous learning algorithm that, on its lower-level, creates an inverse kinematics map using output kinematics collected from an initial random set of actuation commands (motor babbling). Systems that use an explicit kinematics model are, in general, easier to study and interpret, more data efficient and can generalize to a wider range of tasks; however, they can suffer from inaccuracies in the model especially during complex dynamical interactions (e.g., contact dynamics, injury to the body, or changes in the environment) [25], [23], [34], [35], [36], [37]. Systems that perform end-to-end learning (such as PPO), on the other hand, usually require larger number of samples to learn to perform a task, are harder to interpret due to their implicit modeling, and usually cannot generalize well across tasks [38], [39], [33], [40], [41].…”