Deep learning has proved an effective means to capture the non-linear associations of user preferences. However, the main drawback of existing deep learning architectures is that they follow a fixed recommendation strategy, ignoring users' real time-feedback. Recent advances of deep reinforcement strategies showed that recommendation policies can be continuously updated while users interact with the system. In doing so, we can learn the optimal policy that fits to users' preferences over the recommendation sessions. The main drawback of deep reinforcement strategies is that are based on predefined and fixed neural architectures. To shed light on how to handle this issue, in this study we first present deep reinforcement learning strategies for recommendation and discuss the main limitations due to the fixed neural architectures. Then, we detail how recent advances on progressive neural architectures are used for consecutive tasks in other research domains. Finally, we present the key challenges to fill the gap between deep reinforcement learning and adaptive neural architectures. We provide guidelines for searching for the best neural architecture based on each user feedback via reinforcement learning, while considering the prediction performance on real-time recommendations and the model complexity.