Machine-type communications (MTC) should account for half the connections to the internet by 2030. The use case massive MTC (mMTC) allows for applications to connect a massive number of low-power and low-complexity devices, leading to challenges in resource allocation. Not only that, mMTC networks suffer under rigid random access schemes due to mMTC ultra-dense nature resulting in poor performance. In this sense, this paper proposes a Q-Learning-based random access method for massive machine-type communications, with device clustering and non-orthogonal multiple access (NOMA). The traditional NOMA implementation increases spectral efficiency, but at the same time, demands a larger Q-Table, thus slowing down convergence, which is known to be a highly detrimental effect on massive networks. We use pre-clustering through short-range device-to-device technology to mitigate this drawback, allowing devices to operate with a smaller Q-Table . Furthermore, the previous selection of partner devices allows us to implement a full-feedback-based reward mechanism so that clusters avoid time slots already successfully allocated. Additionally, to cope with the negative impact of system overload, we propose an adaptive frame size algorithm to run in the base station (BS). It allows adjusting the frame size to the network load, preventing idle slots in an underloaded scenario, and providing extra slots when the network is overloaded. The results show the great benefits in terms of throughput of the proposed method. In addition, the impact of the use of clustering and the size of the clusters, as well as the frame size adaptation, are analyzed.