The practical realization of end-to-end training of communication systems is fundamentally limited by its accessibility of the channel gradient. To overcome this major burden, the idea of generative adversarial networks (GANs) that learn to mimic the actual channel behavior has been recently proposed in the literature. Contrarily to handcrafted classical channel modeling, which can never fully capture the real world, GANs promise, in principle, the ability to learn any physical impairment, enabled by the data-driven learning algorithm. In this work, we verify the concept of GAN-based autoencoder training in actual over-theair (OTA) measurements. To improve training stability, we first extend the concept to conditional Wasserstein GANs and embed it into a state-of-the-art autoencoder-architecture, including bitwise estimates and an outer channel code. Further, in the same framework, we compare the existing three different training approaches: model-based pre-training with receiver finetuning, reinforcement learning (RL) and GAN-based channel modeling. For this, we show advantages and limitations of GAN-based endto-end training. In particular, for non-linear effects, it turns out that learning the whole exploration space becomes prohibitively complex. Finally, we show that the training strategy benefits from a simpler (training) data acquisition when compared to RL-based training, which requires continuous transmitter weight updates. This becomes an important practical bottleneck due to limited bandwidth and latency between transmitter and training algorithm that may even operate at physically different locations.