SUMMARYDue to the increasing demands for higher data rate applications, also due to the actual spectrum crowd situation, Dynamic Spectrum Access (DSA) turned into an active research topic. In this paper, we analyse DSA in cellular networks context, where a Coordinated Access Band (CAB) is shared between Radio Access Networks (RANs). We propose a Semi-Markov Decision Process (SMDP) approach to derive the optimal DSA policies in terms of operator reward. In order to overcome the limitations induced by optimal policy implementation, we also propose two simple, though sub-optimal, DSA algorithms: a Q-learning (QL) based algorithm and a heuristic algorithm. The achieved reward using the latter is shown to be very close to the optimal case and thus to significantly exceed the reward obtained with Fixed Spectrum Access (FSA). The rewards achieved by using the QL-based algorithm are shown to exceed those obtained by using FSA. Higher rewards and better spectrum utilisation with DSA optimal and heuristic methods are, however, obtained at the price of a reduced average user throughput.