This research proposes an efficient algorithm for solving the green vehicle routing problem (GVRP), capable of generating high-quality solutions while considering environmental impact and computational efficiency. We employ an adapted grey wolf optimizer (GWO) algorithm and Q-learning (QL) for parameter optimization, introducing a discrete grey wolf optimizer (DGWO), a discrete variant of the GWO. The DGWO leverages the 2-opt technique and the Hamming distance concept, making it suitable for addressing discrete problems like GVRP. The key novelty of our approach is the use of QL to refine the parameters of the DGWO, specifically the number of iterations and number of wolves. This application of QL significantly enhances the efficiency and effectiveness of the algorithm compared to DGWO with manual parameter tuning, highlighting the significance of QL in parameter optimization. The proposed discrete grey wolf optimizer-Q-learning (DGWO-QL) algorithm is extensively validated on benchmark instances of GVRP, demonstrating promising results. For smaller benchmark instances comprising of 20 customers and 3 stations, our approach outperforms in 12 out of 16 instances. When tested on larger benchmark instances, within a range of 111 to 500 customers and 21 stations, it achieves success in 10 out of 12 instances. Compared to existing methods, our approach demonstrates improved performance in terms of solution quality and computational efficiency. The results show the robust performance of the DGWO-QL, particularly under stochastic route scenarios, which underscores the advantage of the proposed technique. This study represents a significant contribution to the current body of literature by underscoring the potential of the DGWO-QL algorithm in generating high-quality solutions for GVRP.