This paper proposes an online robust self-learning terminal sliding mode control (RS-TSMC) with stability guarantee for balancing control of reaction wheel bicycle robots (RWBR) under model uncertainties and disturbances, which improves the balancing control performance of RWBR by optimising the constrained output of TSMC. The TSMC is designed for a second-order mathematical model of RWBR. Then robust adaptive dynamic programming based on an actor-critic algorithm is used to optimise the TSMC only by data sampled online. The system closed-loop stability and convergence of the neural network weights are guaranteed based on the Lyapunov analysis. The effectiveness of the proposed algorithm is demonstrated through simulations and experiments.