Research on training Generative Adversarial Networks (GANs) to create 3D human body avatars from 2D datasets is underway. Research done in this field shows promise and has paved the way for significant advancements in a variety of applications, including virtual reality, sports analysis, cinematography, surveillance, and cinematography. By avoiding obstacles and producing high-resolution, rich-information multi-view (RGB) images, drone active tracking combined with aerial photography sensors can eliminate occlusions and enable 3D avatar body reconstruction. Due to several issues, such as restricted perspective coverage, obvious occlusions, and texture disappearance, 3D avatar reconstruction techniques encounter training failures that cause distortions and feature loss in 3D reconstructed models. The new endto-end trainable deep neural network methodology PIXGAN-Drone is presented in this paper for the photorealistic 3D avatar of the human body reconstruction from multi-view images. It is based on the integration of active tracking drones equipped with aerial photography sensors (stable automatic circular motion system) into the Pix2Pix GANs training framework to generate high-resolution 2D models. Conditional image-toimage translation and dynamic aerial perspectives can be used to develop realistic and accurate 3D models. This research conducted experiments on multiple datasets to demonstrate the improved performance of our method over state-of-the-art methods for various metrics (Chamfer, P2S, and CED). The results showed that our 3D reconstructed human avatars were 0.0293, 0.0271, and 0.0232 on RenderPeople, 0.0133, 0.0136, 0.0050 on People Snapshot (indoor), 0.0154, 0.0101, 0.0063 on People Snapshot (outdoor), and 0.0316, 0.0275, 0.0216 on Custom data-drone (collected dataset).INDEX TERMS 3D human avatar, 3D reconstruction, PIX2PIXGAN, body model rendering, drone active tracking.