Birds and experienced glider pilots frequently use atmospheric updrafts for long-distance flight and energy conservation, with harvested energy from updrafts serving as the foundation. Inspired by their common characteristics in autonomous soaring, a reinforcement learning algorithm, the Twin Delayed Deep Deterministic policy gradient, is used to investigate the optimal strategy for an unpowered glider to harvest energy from thermal updrafts. A round updraft model is utilized to characterize updrafts with varied strengths. A high-fidelity six-degree-of-glider model is used in the dynamic modeling of a glider. The results for various flight initial positions and updraft strengths demonstrate the effectiveness of the strategy learned via reinforcement learning. To enhance the updraft perception ability and expand the applicability of the trained glider agent, an extra wind velocity differential correction module is introduced to the algorithm, and a strategy symmetry method is applied. Comparison experiments regarding round updraft, the Gedeon thermal model, and Dryden continuous turbulence indicate the crucial role of the further optimized methods in improving the updraft-sensing ability of the reinforcement learning glider agent. With optimized methods, a glider trained in a simplified thermal updraft with a simple training method can have more effective flight strategies.