A scheme that successfully employs quantum mechanics in the design of autonomous learning agents has recently been reported in the context of the projective simulation (PS) model for artificial intelligence. In that approach, the key feature of a PS agent, a specific type of memory which is explored via random walks, was shown to be amenable to quantization, allowing for a speed-up. In this work we propose an implementation of such classical and quantum agents in systems of trapped ions. We employ a generic construction by which the classical agents are 'upgraded' to their quantum counterparts by a nested process of adding coherent control, and we outline how this construction can be realized in ion traps. Our results provide a flexible modular architecture for the design of PS agents. Furthermore, we present numerical simulations of simple PS agents which analyze the robustness of our proposal under certain noise models.The outline of this paper is as follows. In section 2 we briefly review the PS model and give the basic operational elements which have to be constructed in an implementation of a classical or quantum PS agent. Then, in section 3 we give a more formal treatment of the standard, classical PS agent, and show explicitly how such an agent may be implemented in an ion trap set-up. In particular, in section 3.3, we discuss how the technique of adding coherent control provides a generic construction for emulating the standard PS agent in quantum systems, specifically in trapped ions. Finally, in section 4, we extend our analysis to quantum PS agents by specifying all required operations and describing their implementation in ion traps. In the appendix we further present a simple example for a quantum PS agent that can be straightforwardly implemented in an ion trap, for which we provide numerical simulations incorporating an appropriate error model.
PSThe central component of a PS agent, illustrated in figure 1, is the ECM, which can be formally represented as a stochastic network of clips. Clips represent the units of episodic memory, which consist of memorized percepts, actions and ensuing rewards. The process of PS is triggered by perceptual input that initiates a random walk over the clip space. This walk constitutes the stochastic replay of previously established memories and precedes the initiation of real action. The agent's capability to learn is represented by two mechanisms, (i) the adaption of the transition probabilities between the clips, and (ii) the addition of new clips under compositional principles.More formally, at any instance of time the ECM of an agent can be represented as a directed weighted graph, where the vertices represent the clips, and the weights of the edges represent the transition probabilities, see figure 2. We refer to this graph as the clip network. The random walk, or equivalently, the Markov chain, associated to the process of PS is carried out over the clip network. Finally, the learning aspect of the agent is realized by updating the clip network based on the (re...