The popularity of commercial unmanned aerial vehicles has drawn great attention from the e-commerce industry due to their suitability for last-mile delivery. However, the organization of multiple aerial vehicles efficiently for delivery within limitations and uncertainties is still a problem. The main challenge of planning is scalability, since the planning space grows exponentially to the number of agents, and it is not efficient to let human-level supervisors structure the problem for large-scale settings. Algorithms based on Deep Q-Networks had unprecedented success in solving decision-making problems. Extension of these algorithms to multi-agent problems is limited due to scalability issues. This work proposes an approach that improves the performance of Deep Q-Networks on multi-agent delivery by drone problems by utilizing state decompositions for lowering the problem complexity, Curriculum Learning for handling the exploration complexity, and Genetic Algorithms for searching efficient packet-drone matching across the combinatorial solution space. The performance of the proposed method is shown in a multi-agent delivery by drone problem that has 10 agents and ≈1077 state–action pairs. Comparative simulation results are provided to demonstrate the merit of the proposed method. The proposed Genetic-Algorithm-aided multi-agent DRL outperformed the rest in terms of scalability and convergent behavior.