Navigating multiple drones autonomously in complex and unpredictable environments, such as forests, poses a significant challenge typically addressed by wireless communication for coordination. However, this approach falls short in situations with limited central control or blocked communications. Addressing this gap, our paper explores the learning of complex behaviors by multiple drones with limited vision. Drones in a swarm rely on onboard sensors, primarily forward-facing stereo cameras, for environmental perception and neighbor detection. They learn complex maneuvers through the imitation of a privileged expert system, which involves finding the optimal set of neural network parameters to enable the most effective mapping from sensory perception to control commands. The training process adopts the Dagger algorithm, employing the framework of centralized training with decentralized execution. Using this technique, drones rapidly learn complex behaviors, such as avoiding obstacles, coordinating movements, and navigating to specified targets, all in the absence of wireless communication. This paper details the construction of a distributed multi-UAV cooperative motion model under limited vision, emphasizing the autonomy of each drone in achieving coordinated flight and obstacle avoidance. Our methodological approach and experimental results validate the effectiveness of the proposed vision-based end-to-end controller, paving the way for more sophisticated applications of multi-UAV systems in intricate, real-world scenarios.