We adopt the reinforcement learning algorithm to train the self-propelling agent migrating longdistance in a thermal turbulent environment. We choose the Rayleigh-Bénard turbulent convection cell with an aspect ratio (Γ, which is defined as the ratio between cell length and cell height) of 2 as the training environment. Our results showed that, compared to a naive agent that moves straight from the origin to the destination, the smart agent can learn to utilize the carrier flow currents to save propelling energy. We then apply the optimal policy obtained from the Γ = 2 cell and test the smart agent migrating in convection cells with Γ up to 32. In a larger Γ cell, the dominant flow modes of horizontally stacked rolls are less stable, and the energy contained in higher-order flow modes increases. We found that the optimized policy can be successfully extended to convection cells with a larger Γ. In addition, the ratio of propelling energy consumed by the smart agent to that of the naive agent decreases with the increase of Γ, indicating more propelling energy can be saved by the smart agent in a larger Γ cell. We also evaluate the optimized policy when the agents are being released from the randomly chosen origin, which aims to test the robustness of the learning framework, and possible solutions to improve the success rate are suggested. This work has implications for long-distance migration problems, such as unmanned aerial vehicles patrolling in a turbulent convective environment, where planning energy-efficient trajectories can be beneficial to increase their endurance.