Predicting how proteins interact with small molecules is a complex and challenging task in the field of drug discovery. Two important aspects in this are shape complementarity and inter molecular interactions which are highly driven by the binding site and the ultimate pose of the ligand in which it interacts with the protein. Various state of the art methods exist which provide a range of ligand poses that are potentially a good fit for a given specific receptor, these are usually compute intensive and expensive. In this study, we have designed a method that provides a single optimized ligand pose for a specific receptor. The method is based on reinforcement learning where when exposed to a diverse protein ligand data set the agent is able to learn the underlying complex biochemistry of the protein ligand pair and provide an optimized pair. As a first study on usage of reinforcement learning for optimized ligand pose, the PandoraRLO model is able to predict pose within a range of 0.5A to 4A for a large number of test complexes. This indicates the potential of reinforcement learning in uncovering the inherent patterns of protein-ligand pair in 3D space.