With the rapid progress of urbanization and continuous increasing of automobiles, expressway on-and off-ramp area becomes the bottleneck, and recurrent congestion occurs frequently. In order to solve the problem of traffic jam caused around off-ramps, various methods have been employed. Among all, mainline variable speed limit (VSL) control accounts for some proportion. In this study, mainline VSL adjustment of off-ramp upstream is investigated with the reinforcement learning algorithm under the connected vehicles environment to alleviate the traffic congestion. First, the assumptions are made to be suitable for the traffic conditions of mainline VSL control on off-ramp upstream, and then VSL algorithm based on reinforcement learning is presented, and Q-learning is chosen as the main algorithm. Next, the state space, action space, and reward function required by Q-learning are constructed orderly, and the related parameters are labelled. After that, according to the platform based on Python and VISSIM, three schemes, free control (Scheme 0), mainline VSL adjustment of off-ramp upstream based on rule (Scheme 1), and mainline VSL adjustment of off-ramp upstream based on Q-learning algorithm (Scheme 2), are designed, and the three schemes are simulated and compared quantitatively to reflect the off-ramp travel efficiency. The results indicate that mainline dynamic VSL adjustment of off-ramp upstream based on Q-learning algorithm performs the best in terms of general and specific indexes. The results provide potential insights for relieving the traffic congestion and traffic flow control under CAVs environment.