The last decade has seen an unprecedented growth in artificial intelligence and photonic technologies, both of which drive the limits of modern-day computing devices. In line with these recent developments, this work brings together the state of the art of both fields within the framework of reinforcement learning. We present the blueprint for a photonic implementation of an active learning machine incorporating contemporary algorithms such as SARSA, Q-learning, and projective simulation. We numerically investigate its performance within typical reinforcement learning environments, showing that realistic levels of experimental noise can be tolerated or even be beneficial for the learning process. Remarkably, the architecture itself enables mechanisms of abstraction and generalization, two features which are often considered key ingredients for artificial intelligence. The proposed architecture, based on single-photon evolution on a mesh of tunable beamsplitters, is simple, scalable, and a first integration in quantum optical experiments appears to be within the reach of near-term technology.OPEN ACCESS RECEIVED promising features as compared to electronic processors [27][28][29][30]. For instance, nanosecond-scale routing and reconfigurability have already been demonstrated [31][32][33], while encoding information in photons enables decision-making at the speed of light, only limited by the generation and detection rates. Moreover, the use of phase-change materials for in-memory information processing [34] promises to enhance the energy efficiency, since their properties can be modified without continuous external intervention [35,36]. Importantly, since the architecture uses single photons, decision-making is fueled by genuine quantum randomness. This feature marks a fundamental departure from pseudorandom number generation in conventional devices. (ii) The second contribution is the development of a specific variant of PS based on binary decision trees (tree-PS, or t-PS for short), which is closely connected to the standard PS and suitable for the implementation on a photonic circuit. Furthermore, we discuss how this variant enables key features of artificial intelligence, namely abstraction and generalization [37,38].The Article is structured as follows. In section 2 we summarize the theoretical framework of RL, exemplified by three common approaches: SARSA, Q-learning, and PS. In section 3 we describe the blueprint for a fully integrated, photonic RL agent. We then numerically investigate its performance within two standard RL tasks and under realistic experimental imperfections in section 4. Finally, in section 5 we discuss promising features of this architecture within the context of t-PS.