With the emergence of various types of applications such as delay-sensitive applications, future communication networks are expected to be increasingly complex and dynamic. Network Function Virtualization (NFV) provides the necessary support towards efficient management of such complex networks, by virtualizing network functions and placing them on shared commodity servers. However, one of the critical issues in NFV is the resource allocation for the highly complex services; moreover, this problem is classified as an NP-Hard problem. To solve this problem, our work investigates the potential of Deep Reinforcement Learning (DRL) as a swift yet accurate approach (as compared to integer linear programming) for deploying Virtualized Network Functions (VNFs) under several Quality-of-Service (QoS) constraints such as latency, memory, CPU, and failure recovery requirements. More importantly, the failure recovery requirements are focused on the node-outage problem where outage can be either due to a disaster or unavailability of network topology information (e.g., due to proprietary and ownership issues). In DRL, we adopt a Deep Q-Learning (DQL) based algorithm where the primary network estimates the action-value function Q, as well as the predicted Q, highly causing divergence in Q-value’s updates. This divergence increases for the larger-scale action and state-space causing inconsistency in learning, resulting in an inaccurate output. Thus, to overcome this divergence, our work has adopted a well-known approach, i.e., introducing Target Neural Networks and Experience Replay algorithms in DQL. The constructed model is simulated for two real network topologies—Netrail Topology and BtEurope Topology—with various capacities of the nodes (e.g., CPU core, VNFs per Core), links (e.g., bandwidth and latency), several VNF Forwarding Graph (VNF-FG) complexities, and different degrees of the nodal outage from 0% to 50%. We can conclude from our work that, with the increase in network density or nodal capacity or VNF-FG’s complexity, the model took extremely high computation time to execute the desirable results. Moreover, with the rise in complexity of the VNF-FG, the resources decline much faster. In terms of the nodal outage, our model provided almost 70–90% Service Acceptance Rate (SAR) even with a 50% nodal outage for certain combinations of scenarios.