Efficient control of tunnel boring machine (TBM) tunneling along the designed tunnel axis in an unknown variable geological environment is a difficult and significant task. At present, the TBM attitude during tunneling is mostly manually controlled based on the deviation between the tunneling axis and the designed tunnel axis and their experiences. The tunneling axis from manual control is often the snakelike motion around the designed tunnel axis, even exceeding the deviation limit, for which this paper analyzed three reasons, the unknown geological environment, the hysteresis of TBM position response, and the unsolved overall optimization of tunneling axis. For these reasons, this paper proposed a real-time optimal control framework of TBM attitude based on reinforcement learning, which contains the geological information predictive model, TBM attitude and position (TBMAP) predictive model, and optimal attitude control policy (OACP). This framework can predict the current geological information in real-time and provide the corresponding real-time optimal attitude control that simultaneously considers the hysteresis of TBM position response and the overall optimization of the tunneling axis. This attitude control framework can be directly deployed to TBM without increasing costs and excessive modifications to the equipment. To verify the effectiveness of this attitude control framework, the Xinjiang Yiner Water Supply Phase II Project, using the TBM method, was adopted as a case study. The results revealed that the accuracy of geological environment recognition reached 94%, and OACP can significantly reduce the accumulated deviation of the tunneling axis from the designed tunnel axis by over 80% compared with the manual control and easily provide real-time decision support for attitude control in actual engineering.