Base stations (BSs) are the most energy-consuming segment of mobile networks. To reduce BS energy consumption, different components of BSs can sleep when BS is not active. According to the activation/deactivation time of the BS components, multiple sleep modes (SMs) are defined in the literature. In this study, we model the problem of BS energy saving utilizing multiple sleep modes as a sequential Markov decision process (MDP) and propose an online traffic-aware deep reinforcement learning approach to maximize the long-term energy saving. However, there is a risk that BS is not sleeping at the right time and incurs large delays to the users. To tackle this issue, we propose to use a digital twin model to encapsulate the dynamics underlying the investigated system and estimate the risk of decision-making (RDM) in advance. We define a novel metric to quantify RDM and predict the performance degradation. The RDM calculated by DT is compared with a tolerable threshold set by the mobile operator. Based on this comparison, BS can decide to deactivate the SMs, re-train when needed to avoid taking high risks, or activate the SMs to benefit from energy savings. For deep reinforcement learning, we use long-short term memory (LSTM), to take into account the long and short-term dependencies in input traffic, and approximate the Q-function. We train the LSTM network using the experience replay method over a real traffic data set obtained from an operator's BS in Stockholm. The data set contains data rate information with very coarse-grained time granularity. Thus, we propose a scheme to generate a new data set using the real network data set which 1) has finer-grained time granularity and 2) considers the bursty behavior of traffic data. Simulation results show that using proposed methods, considerable energy saving is obtained, compared to the baselines at cost of negligible number of delayed users. Moreover, the proposed digital twin model can predict the performance of the DQN proactively in terms of RDM hence preventing the performance degradation in the network in anomalous situations.