Aiming at the problems that most of the rotating machinery fault diagnosis algorithms are oriented to labeled data, the cost of label information collection is high and the number of fault samples is far less than that of normal samples, this paper proposes a deep reinforcement learning based rotating machinery fault diagnosis method for unlabeled and imbalanced data. The method leverages the relationship between samples and cluster centers to provide feedback in the form of reward information. It employs mechanical vibration signal samples as model state inputs and fault type selection as selectable actions for the agent. An interactive environment is constructed, allowing the agent to observe, act, and receive rewards in the absence of fault labels. Additionally, a deep neural network is utilized to approximate the Q function, which is then maximized to obtain the optimal policy, enabling fault diagnosis in the absence of labeled data. Through validation with rolling bearing fault data, the proposed method demonstrates a 15% improvement in diagnostic accuracy compared to the K-Means algorithm when dealing with imbalanced data.