In this retrospective study, 10,000 anteroposterior (AP) radiography of the knee from a single institution was used to create medical data set that are more balanced and cheaper to create. Two types of convolutional networks were used, deep convolutional GAN (DCGAN) and Style GAN Adaptive Discriminator Augmentation (StyleGAN2-ADA). To verify the quality of generated images from StyleGAN2-ADA compared to real ones, the Visual Turing test was conducted by two computer vision experts, two orthopedic surgeons, and a musculoskeletal radiologist. For quantitative analysis, the Fréchet inception distance (FID), and principal component analysis (PCA) were used. Generated images reproduced the features of osteophytes, joint space narrowing, and sclerosis. Classification accuracy of the experts was 34%, 43%, 44%, 57%, and 50%. FID between the generated images and real ones was 2.96, which is significantly smaller than another medical data set (BreCaHAD = 15.1). PCA showed that no significant difference existed between the PCs of the real and generated images (p > 0.05). At least 2000 images were required to make reliable images optimally. By performing PCA in latent space, we were able to control the desired PC that show a progression of arthritis. Using a GAN, we were able to generate knee X-ray images that accurately reflected the characteristics of the arthritis progression stage, which neither human experts nor artificial intelligence could discern apart from the real images. In summary, our research opens up the potential to adopt a generative model to synthesize realistic anonymous images that can also solve data scarcity and class inequalities.