The approach of directly combining clustering method and reinforcement learning (RL) will lead to encounter the issue where states may have different state transition processes under the same action, resulting in poor policy performance. To address this challenge with multi-dimensional continuous observation data, an improved reinforcement learning method based on unsupervised learning is proposed with a novel framework. Instead of dimensionality reduction methods, unsupervised clustering is employed to indirectly capture the underlying structure of the data. First, the proposed framework incorporates multidimensional information, including the current observation data, the next observation data and reward information, during the clustering process, leading to a more accurate and comprehensive low-dimensional discrete representation of the observation data while retaining preserving transition of Markov decision process. Second, by compressing the observation data into a well-defined state space, the resulting cluster labels serve as the low-dimensional discrete label-states for reinforcement learning to generate more effective and robust policies. Comparative analysis with state-of-the-art RL methods demonstrates that the improved RL methods base on framework achieves higher rewards, indicating its superior performance. Furthermore, the framework exhibits computational efficiency, as evidenced by its reasonable time complexity. This structural innovation allows for better exploration and exploitation of the transition, leading to improved policy performance in engineering applications.