We propose a scheme for achieving basic quantum gates using ultracold polar molecules in pendular states. The qubits are encoded in the YbF molecules trapped in an electric field with a certain gradient and coupled by the dipole–dipole interaction. The time-dependent control sequences consisting of multiple pulses are considered to interact with the pendular qubits. To achieve high-fidelity quantum gates, we map the control problem for the coupled molecular system into a Markov decision process and deal with it using the techniques of deep reinforcement learning (DRL). By training the agents over multiple episodes, the optimal control pulse sequences for the two-qubit gates of NOT, controlled NOT, and Hadamard are discovered with high fidelities. Moreover, the population dynamics of YbF molecules driven by the discovered gate sequences are analyzed in detail. Furthermore, by combining the optimal gate sequences, we successfully simulate the quantum circuit for entanglement. Our findings could offer new insights into efficiently controlling molecular systems for practical molecule-based quantum computing using DRL.