The emergence of novel AI technologies and increasingly portable wearable devices have introduced a wider range of more liberated avenues for communication and interaction between human and virtual environments. In this context, the expression of distinct emotions and movements by users may convey a variety of meanings. Consequently, an emerging challenge is how to automatically enhance the visual representation of such interactions. Here, a novel Generative Adversarial Network (GAN) based model, AACOGAN, is introduced to tackle this challenge effectively. AACOGAN model establishes a relationship between player interactions, object locations, and camera movements, subsequently generating camera shots that augment player immersion. Experimental results demonstrate that AACOGAN enhances the correlation between player interactions and camera trajectories by an average of 73%, and improves multi‐focus scene quality up to 32.9%. Consequently, AACOGAN is established as an efficient and economical solution for generating camera shots appropriate for a wide range of interactive motions. Exemplary video footage can be found at https://youtu.be/Syrwbnpzgx8.