This paper applies the reinforcement learning in the joint relay selection and power allocation in the secure cognitive radio (CR) relay network, where the data buffers and full-duplex jamming are applied at the relay nodes. Two cases are considered: maximizing the throughput with the delay and secrecy constraints, and maximizing the secrecy rate with the delay constraint, respectively. In both cases, the optimization relies on the buffer states, the interference to/from the primary user, and the constraints on the delay and/or secrecy. This makes it mathematically intractable to apply the traditional optimization methods. In this paper, the double deep Q-network (DDQN) is used to solve the above two optimization problems. We also apply the a-priori information in the CR network to improve the DDQN learning convergence. Simulation results show that the proposed scheme outperforms the traditional algorithm significantly.