Currently, many methods are available to improve the target network's security. The vast majority of them cannot obtain an optimal attack path and interdict it dynamically and conveniently. Almost all defense strategies aim to repair known vulnerabilities or limit services in target network to improve security of network. These methods cannot response to the attacks in real-time because sometimes they need to wait for manufacturers releasing corresponding countermeasures to repair vulnerabilities. In this paper, we propose an improved Q-learning algorithm to plan an optimal attack path directly and automatically. Based on this path, we use software-defined network (SDN) to adjust routing paths and create hidden forwarding paths dynamically to filter vicious attack requests. Compared to other machine learning algorithms, Q-learning only needs to input the target state to its agents, which can avoid early complex training process. We improve Q-learning algorithm in two aspects. First, a reward function based on the weights of hosts and attack success rates of vulnerabilities is proposed, which can adapt to different network topologies precisely. Second, we remove the actions and merge them into every state that reduces complexity from ( 3 ) to ( 2 ). In experiments, after deploying hidden forwarding paths, the security of target network is boosted significantly without having to repair network vulnerabilities immediately.