A reconfigurable intelligent surface (RIS) is a promising technology that can extend short-range millimeter wave (mmWave) communications coverage. However, phase shifts (PSs) of both mmWave transmitter (TX) and RIS antenna elements need to be optimally adjusted to effectively cover a mmWave user. This paper proposes codebook-based phase shifters for mmWave TX and RIS to overcome the difficulty of estimating their mmWave channel state information (CSI). Moreover, to adjust the PSs of both, an online learning approach in the form of a multiarmed bandit (MAB) game is suggested, where a nested two-stage stochastic MAB strategy is proposed. In the proposed strategy, the PS vector of the mmWave TX is adjusted in the first MAB stage. Based on it, the PS vector of the RIS is calibrated in the second stage and vice versa over the time horizon. Hence, we leverage and implement two standard MAB algorithms, namely Thompson sampling (TS) and upper confidence bound (UCB). Simulation results confirm the superior performance of the proposed nested two-stage MAB strategy; in particular, the nested two-stage TS nearly matches the optimal performance.