Reinforcement learning (RL) is applied to improve the performance of the polarization modulator (PolM)-based W-band radio-over-fiber (RoF) system in this Letter. By controlling the polarization angle of the dual-wavelength laser source in the PolM-based scheme, the RF response can be easily modified and therefore it hugely increases the available bandwidth in the RoF system. In the proposed RL scheme, the state is described as the value of the angle from the polarization controller (PC). We use changing the angle of the polarizer (P) as the actions of the RL agent to optimize the frequency response. The agent also receives a reward from the system and learns from the environment and previous experience. Moreover, the reward is the value of error vector magnitude at each state. Therefore, the proposed scheme of RL is implemented and demonstrated in a multi-channel RoF system, and the results show that an RL agent provides an effective intelligent technique to obtain the best quality of data transmission.