“…For a given problem within the RFML space, once a reliable training routine and a network of sufficient size have been identified, how well a trained network is able to solve the problem often comes down to the quantity and quality of the data available. 7 Effectively, there are three sources of data that can be used to train networks within the RFML space: simulated/synthetic, 5, captured/collected, 4,6,11,12,18,23,35,37,48,54,[68][69][70][71][72][73][74][75][76][77][78][79][80][81][82][83][84][85] and augmented, 11,47,48,64,71,78 which is a combination of the first two using domain knowledge (focus of this work), or using generative adversarial networks (GAN) as performed in Davaslioglu et al 47 Due to the nature of the RFML data space, simulated data are inexpensive thanks to opensource tool-kits like GNU Radio, where observations can be generated uniquely in parallel, with the only bottleneck being the available computer resources. 30 Comparatively, performing an over-the-air (OTA) collection costs many orders of magnitude greater in terms of time and money due to procurement and configuration of the hardware transceivers and having to generate data in real time rather than in parallel as is done in simulation, yet all the work done in order to simulate the data is still needed when not examining commercial off-the-shelf (COTS) equipment...…”