Deep Reinforcement Learning (DRL) is rising as a promising tool for solving optimization problems in optical networks. Though studies employing DRL for solving static optimization problems in optical networks are appearing, assessing strengths and weaknesses of DRL with respect to state-of-theart solution methods is still an open research question. In this work, we focus on Routing and Wavelength Assignment (RWA), a well-studied problem for which fast and scalable algorithms leading to better optimality gaps are always sought for. We develop two different DRL-based methods to assess the impact of different design choices on DRL performance. In addition, we propose a Multi-Start approach that can improve the average DRL performance, and we engineer a shaped reward that allows efficient learning in networks with high link capacities. With Multi-Start, DRL gets competitive results with respect to a state-of-the-art Genetic Algorithm with significant savings in computational times. Moreover, we assess the generalization capabilities of DRL to traffic matrices unseen during training, in terms of total connection requests and traffic distribution, showing that DRL can generalize on small to moderate deviations with respect to the training traffic matrices. Finally, we assess DRL scalability with respect to topology size and link capacity.