In recent years, integrated production and distribution scheduling (IPDS) has become an important subject in supply chain management. However, IPDS considering distributed manufacturing environments is rarely researched. Moreover, reinforcement learning is seldom combined with metaheuristics to deal with IPDS problems. In this work, an integrated distributed flow shop and distribution scheduling problem is studied, and a mathematical model is provided. Owing to the problem’s NP-hard nature, a multi-objective Q-learning-based brain storm optimization is designed to minimize makespan and total weighted earliness and tardiness. In the presented approach, a double-string representation method is utilized, and a dynamic clustering method is developed in the clustering phase. In the generating phase, a global search strategy, a local search strategy, and a simulated annealing strategy are introduced. A Q-learning process is performed to dynamically choose the generation strategy. It consists of four actions defined as the combinations of these strategies, four states described by convergence and uniformity metrics, a reward function, and an improved ε-greedy method. In the selecting phase, a newly defined selection method is adopted. To assess the effectiveness of the proposed approach, a comparison pool consisting of four prevalent metaheuristics and a CPLEX optimizer is applied to conduct numerical experiments and statistical tests. The results suggest that the designed approach outperforms its competitors in acquiring promising solutions when handling the considered problem.