Millimetre wave (mmWave) communications, that is, 30 to 300 GHz, have intermittent short-range transmissions, so the use of reconfigurable intelligent surface (RIS) seems to be a promising solution to extend its coverage. However, optimizing phase shifts (PSs) of both mmWave base station (BS) and RIS to maximize the received spectral efficiency at the intended receiver seems challenging due to massive antenna elements usage. In this paper, an online learning approach is proposed to address this problem, where it is considered a two-phase multi-armed bandit (MAB) game. In the first phase, the PS vector of the mmWave BS is adjusted, and based on it, the PS vector of the RIS is calibrated in the second phase and vice versa over the time horizon. The minimax optimal stochastic strategy (MOSS) MAB algorithm is utilized to implement the proposed two-phase MAB approach efficiently. Furthermore, to relax the problem of estimating the channel state information (CSI) of both mmWave BS and RIS, codebook-based PSs are considered. Finally, numerical analysis confirms the superior performance of the proposed scheme against the optimal performance under different scenarios.