Service composition is a mainstream paradigm for rapidly constructing large-scale distributed applications. QoS-aware service composition, i.e., selection of the optimal execution plan that maximizes the composition’s end-to-end QoS properties, is an active area of research and development endeavors in service composition. In this paper, we propose PPDRL, a pretraining-and-policy-based deep reinforcement learning approach, to solve the QoS-aware service composition problem. Its significant feature is to incorporate a maximum likelihood estimate and a policy scoring mechanism into a deep reinforcement learning framework. As a result, our approach can balance the exploitation and exploration efforts adaptively and can search for the solution space in a robust and efficient manner. We have executed our approach to solve 6 randomly generated QoS-aware service composition problems with different sizes and structures based on QWS data set including 2,507 real Web services classified into 233 categories. The results indicate that our approach can find near-optimal solutions within moderate numbers of iteration and has performance superiority in comparison with five state-of-the-art algorithms.