Vertical Radial Plume Mapping (VRPM) technique is often used in the measurement of gas emission flux in open space. It is necessary to use optical remote sensing equipment (ORS) to scan multiple measurement points to reconstruct the gas concentration field, but the fluctuation of field environmental conditions and the mechanical error of the system will lead to the optical path deviation. Although the optical path calibration can be completed by researching and positioning the central position of the measurement point according to the signal strength, the search range needs to be preset, which can not balance the time cost and positioning accuracy, reducing the time resolution of the concentration data, and resulting in flux calculation error. To solve this problem, this paper proposes a Q-learning multi-optical path localization method based on detection signal quality. This method uses the change of signal strength when the optical path moves as a reward to learn the environment, affects the selection of the next calibration direction, and makes the optical path preferentially choose the direction with enhanced signal strength. The effectiveness of this method is verified on the 25 * 25 map established of simulating the optical path offset. The results show that this method can get the optimal path to the center point, the minimum number of steps is 14, the running time is less than 2 seconds, and the success rate can reach 100% after many episodes of learning, which proves the effectiveness of Q-learning method in multi-optical path scanning.