Partial discharge (PD) current is an impulse signal at nanosecond level, which can generate electromagnetic (EM) wave containing broadband frequency information. The frequency band of EM signal is from MHz up to GHz. Due to different PD patterns, impulse currents with different shapes induce different EM waves containing different frequency information. Therefore, using the features extracted from frequency domain of EM signals, the classification of PD patterns can be effectively got. It is good to use wavelet or wavelet packet decomposition to select features. However, if the decomposition level is too shallow to find enough effective features, it cannot group the EM signals to the right pattern. On the contrary, although it is easier to find features to distinguish the PD pattern if the decomposition level is deep, there will be a lot of redundancy variables and it is hard to select features among so many variables. In this paper, a method is presented, which selected features in the whole decomposition tree instead of selecting among the leaf node of the tree, because more potential features can be found in the whole tree. With the present method, it is possible not only to get enough features, but also to eliminate the redundancy variables effectively. In order to validate the method, large EM signals from four PD patterns in a power transformer are acquired as the training data and testing data for feature selection and classification, and three common classification methods are introduced to classify the PD patterns using the features selected by the method. Most of the classification results are satisfactory indicating that the proposed method is effective.