One of the most prominent research goals in the field of mobile autonomous robots is to create robots that are able to adapt to new environments, i.e., the robots should be able to learn during their "lifetime" possibly without (or a minimum) of human intervention. When employing artificial neural networks (ANNs) to control the robot, reinforcement learning (RL) techniques are a good candidate for achieving continuous on-line learning. A problem with RL applied to robot learning is that the state (and action) space of a robot is typically not discrete. Thus, the robot had to evaluate an infinite number of possible actions at every time step in order to select the best. To overcome this problem we add a second network module to the neurocontroller acting as a memory of previous decisions (state-action pairs) of the robot. The robot's actual decisions, then, are based on previous decisions retrieved from memory. Additionally, intrinsic noise in the memory network gives the robot the possibility to evaluate new "ideas", hence it becomes creative. We analyze the potential of the above approach by measuring the ability of (simulated) robots to learn simple tasks using temporal difference (TD) learning.