Network Function Virtualization (NFV) and service orchestration simplify the deployment and management of network and telecommunication services. The deployment of these services requires, typically, the allocation of Virtual Network Function -Forwarding Graph (VNF-FG), which implies not only the fulfillment of the service's requirements in terms of Quality of Service (QoS), but also considering the constraints of the underlying infrastructure. This topic has been well-studied in existing literature, however, its complexity and uncertainty of available information unveil challenges for researchers and engineers. In this paper, we explore the potential of reinforcement learning techniques for the placement of VNF-FGs. However, it turns out that even the most well-known learning technique is ineffective in the context of a large-scale action space. In this respect, we propose approaches to find out feasible solutions while improving significantly the exploration of the action space. The simulation results clearly show the effectiveness of the proposed learning approach for this category of problems. Moreover, thanks to the deep learning process, the performance of the proposed approach is improved over time.