Moving target defense (MTD) technology baffles potential attacks by dynamically changing the software in use and/or its configuration while maintaining the application’s running states. But it incurs a deployment cost and various performance overheads, degrading performance. An attack graph is capable of evaluating the balance between the effectiveness and cost of an MTD deployment. In this study, we consider a network scenario in which each node in the attack graph can deploy MTD technology. We aim to achieve MTD deployment effectiveness optimization (MTD-DO) in terms of minimizing the network security loss under a limited budget. The existing related works either considered only a single node for deploying an MTD or they ignored the deployment cost. We first establish a non-linear MTD-DO formulation. Then, two deep reinforcement learning-based algorithms are developed, namely, deep Q-learning (DQN) and proximal policy optimization (PPO). Moreover, two metrics are defined in order to effectively evaluate MTD-DO algorithms with varying network scales and budgets. The experimental results indicate that both PPO- and DQN-based algorithms perform better than Q-learning-based and random algorithms. The DQN-based algorithm converges more quickly and performs, in terms of reward, marginally better than the PPO-based algorithm.