Significant advancements have been achieved in radio frequency (RF) machine learning (ML), yet much of the existing research remains confined to proof-of-concept demonstrations or narrowly defined scenarios that do not fully encompass real-world operational conditions. As machine learning algorithms increasingly assume decision-making roles in this domain, ensuring the reliability and validity of their outcomes becomes imperative. This paper seeks to expand the discourse on ML assurance within the RF ML domain. Building upon the foundation laid by the RiftNet deep learning classifier, this study investigates the robustness of this solution in the context of instilling confidence in RFML applications. Specifically, the paper delves into the challenges posed by inaccurately labeled data during artificial intelligence agent development, the quantification of outcome uncertainties, and the implementation of calibration techniques for RF modulation recognition. In essence, this research marks a pivotal shift from the initial successes of RF ML towards the development of techniques that inspire confidence in real-world, edge-system applications.