Recurrent neural networks (RNNs) have produced significant results in many fields, such as natural language processing and speech recognition. Owing to their computational complexity and sequence dependencies, RNNs need to be deployed on customized hardware accelerators to satisfy performance and energy-efficiency constraints. However, designing hardware accelerators for RNNs is challenged by the vast design space and the reliance on ineffective optimization. An efficient automated design space exploration (DSE) strategy that can balance conflicting objectives is wanted. To address the low efficiency and insufficient universality of the resource allocation process employed for hardware accelerators, we propose an automated two-stage design space exploration (DSE) strategy for customized RNN accelerators. The strategy combines a genetic algorithm (GA) and a reinforcement learning (RL) algorithm, and it utilizes symmetrical exploration and exploitation to find the optimal solutions. In the first stage, the area of the hardware accelerator is taken as the optimization objective, and the GA is used for partial exploration purposes to narrow the design space while maintaining diversity. Then, the latency and power of the hardware accelerator are taken as the optimization objectives, and the RL algorithm is used in the second stage to find the corresponding Pareto solutions. To verify the effectiveness of the developed strategy, it is compared with other algorithms. We use three different network models as benchmarks: a vanilla RNN, LSTM, and a GRU. The results demonstrate that the strategy proposed in this paper can provide better solutions and can achieve latency, power, and area reductions of 9.35%, 5.34%, and 11.95%, respectively. The HV of GRMD is reduced by averages of 6.33%, 6.32%, and 0.67%, and the runtime is reduced by averages of 18.11%, 14.94%, and 10.28%, respectively. Additionally, given different weights, it can make reasonable trade-offs between multiple objectives.