By using signal processing and statistical analysis methods simultaneously, many heterogeneous features can be produced to describe the bearings fault with more comprehensive and discriminant information. At same time, there may exist redundant or irrelevant information which will instead reduce the diagnosis performance. To solve this problem, it is necessary to conduct feature selection which tries to choose the most typical and discriminant features by evaluating their effect on fault status. However, if the structural relationship between features has not been considered well, some similar or redundant features are still probably chosen, which would introduce bias into the final diagnosis model. In this paper, a new fault diagnosis method of bearings based on structural feature selection is proposed to solve the aforementioned problem. Obeying the hypothesis that the features with strong relatedness have close coefficient distance, the proposed method aims to improve diagnosis performance via determining group structure in fault features. First, a new feature selection strategy is proposed by introducing a group identification matrix. Using this matrix, two evaluation criteria about intra-group feature correlation and inter-group feature difference are constructed by means of coefficient’s distance. Consequently, we get a multi-objective 0–1 integer programming problem by minimizing intra-group distance and maximizing inter-group distance simultaneously. Second, we use the multi-objective particle swarm optimization algorithm to solve this problem, and then determine the optimal group structure of features adaptively. Finally, a diagnosis model can be trained by support vector machine on the typical features extracted from these groups. Experimental results on four UCI datasets show the effectiveness of the proposed group feature selection strategy. Moreover, the experimental results on two bearing datasets (i.e., CWRU and IMS datasets) demonstrate that the proposed method can identify the inherent group structure in fault features, and then has better diagnosis performance compared with several state-of-the-art methods.