A multi-agent iterative optimisation method based on deep reinforcement learning is proposed for the balancing and sequencing problem in mixed model assembly lines. Based on the Markov decision process model for balancing and sequencing, a balancing agent using a deep deterministic policy gradient algorithm, a sequencing agent using an Actor-Critic algorithm, as well as an iterative interaction mechanism between these agents' output solutions are designed for realising the global optimisation of mixed model assembly lines. The exchange of solution information including assembly time and station workload in the iterative interaction realises the coordination of the worker assignment policy at the balancing stage and the production arrangement policy at the sequencing stage for the minimisation of work overload and idle time at stations. Through the comparative experiments with heuristic rules, genetic algorithms, and the original deep reinforcement learning algorithm, the effectiveness of the proposed method is demonstrated and discussed for small-scale instances as well as large-scale ones.This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.