With the advent of Industrial 4.0, industrial robots have been widely used in various sectors, e.g., in assembly. Peg-in-hole (PiH) assembly is the most typical one among assembly tasks. In PiH tasks, the robot transfers from a non-contact mode to a contact-rich mode. Instead of switching the position/force control mode with a pause when contact is detected, we deploy a non-diagonal stiffness compliance control for planning the adaptive trajectory to improve the task efficiency and ensure contact safety. In this paper, we proposed a deep reinforcement learning (DRL) method to achieve the above compliance. A compliance controller based on a virtual forward dynamic (FD) model is built. A DRL agent is deployed to optimize the parameters of the non-diagonal stiffness matrix for the built compliance controller to generate a trajectory that adapts to the changing contact condition. The experiment shows the proposed method can control the contact force within a safe range and improve the efficiency of assembly tasks.