The development of autonomous, fast, agile small Unmanned Aerial Vehicles (UAVs) brings up fundamental challenges in dynamic environments with fast and agile maneuvers, unreliable state estimation, imperfect sensing, coupling action, and perception in real-time under severe resource constraints. However, autonomous drone racing is a challenging research problem at the intersection of computer vision, planning, state estimation, and control. To bridge this, we propose an approach in the context of autonomous, perception-action aware vision-based drone racing in a photorealistic environment. Our approach integrates a deep convolutional neural network (CNN) with state-of-the-art path planning, state estimation, and control algorithms. The developed deep learning method is based on computer vision approaches to detecting the gates and estimating the flyable area. The planner and controller then use this information to generate a short, minimum-snap trajectory segment and send corresponding motor commands to reach the desired goal. A thorough evaluation of our proposed methodology has been carried out using the Gazebo and FlightGoggles (photorealistic sensor) environments. Extensive experiments demonstrate that the proposed approach outperforms state-of-the-art methods and flies the drone more consistently than many human pilots. Moreover, we demonstrated that our proposed system successfully guided the drone through tight race courses, reaching speeds up to 7m/s of the 2019 AlphaPilot Challenge.