This paper presents a methodology on how cognitive radio networks can form clusters by exploiting reinforcement learning-based principles. Each node repeatedly senses the received signal strength indicator beacons given off by other nodes in the network. This information can be used by the nodes to learn about its positioning significance within the network and whether to become cluster heads thus forming a cluster. Extensive simulation results are presented, and it is shown that on average, the clustering performance (packing efficiency) can be improved when nodes have the ability to learn. The results are compared with that of a k-means and node degree-based approach for the formation of clusters. It is found that the node degree scheme can result in the clustering performance deteriorating, which is undesirable as it can lead to an increase in the total energy consumption of the network over time. It is shown that in a shadowing environment, that clusters formed via learning through received signal strength indicator can reduce their transmission power by up to 2 dBW (achieving a potential power saving of 37 per cent) while achieving the same signal-to-noise ratio as that of the no learning and node degree schemes. Further energy conservation can be obtained by restricting beacon transmissions to immediate neighbouring nodes. In certain geographical node distributions, a no learning scheme is preferred for the formation of clusters due to its lower overhead.