Performance optimization literature in optical networks predominantly consists of single objective optimization studies while often in practice multiple performance goals are to be met. This study addresses this issue with a generalized reinforcement learning (RL) model for parameter optimization in optical networks in the presence of multiple performance goals. Using this generic model, two multi-objective variants of a classical optimization problem in optical network operation, routing and wavelength assignment (RWA), are derived and solved to near optimality. The allocated route and wavelength for each demand are optimized with respect to the number of accepted services, the number of transmitters, and network availability. The resultant approximated Pareto front provides a set of solutions from which network operators can make decisions based on their preferences for particular objectives. These results contribute to the understanding of the relationships between different network parameters and performance metrics, which would be beneficial in future network design and growth. Moreover, benchmarking results against the state-of-the-art RWA heuristics suggest the applicability of RL in dynamic settings under changing traffic and generalizability for unseen traffic.