Bayesian networks are a popular diagnosis method, whose structures are usually defined by human experts and parameters are learned from data. For the increasing complexity of modern systems, building their structures based on physical behaviors is becoming a difficult task. However, the improvement of data collection techniques motivates learning their structures from data, where greedy search is a typical iterative method. In each iteration, it generates multiple structure candidates by modifying one edge, evaluates these structures by scores based on data and selects the best structure for the next iteration. This method is costly because there are too many structures to be evaluated. To solve this problem, we frame the traditional greedy search by Markov decision process and propose an efficient Bayesian network learning approach by integrating reinforcement learning into it. In our approach, a convolutional neural network is employed as the value function to approximate scores. Before evaluating structures using data, the neural network is used to predict scores. Structure candidates with a low predicted score are discarded. By avoiding unnecessary computation, the cooperation of reinforcement learning and greedy search effectively improves the learning efficiency. Two systems, a 10-tank system with 21 monitored variables and the classic Tennessee Eastman process with 52 variables, are employed to demonstrate our approach. The experiment results indicate that the computation cost of our method was reduced by 30%∼50%, and the diagnosis accuracy was almost the same.INDEX TERMS Data driven fault diagnosis, Bayesian network, greedy search, reinforcement learning.