This paper presents a new framework comprising the combination of uplink cell-free massive multiple input multiple output (mMIMO) with non-orthogonal multiple access (NOMA) for serving Internet-of-Things (IoT) devices in an uncoordinated manner. We investigate the benefits of reinforcement learning to support the massive connectivity and quality of service (QoS) requirements of IoT devices. Using the multi-armed bandit technique, the devices jointly determine their subband and transmit power, without any cooperation, in such a way as to strike a balance between the QoS and the transmit power. Applied with a dynamic cooperation clustering of serving access points (APs), the allocation technique is shown to achieve a quick convergence while having a negligible loss in performance towards a system that relies on all deployed APs for serving the devices.