The challenge to effectively and accurately determine pure partial discharge (PD) signals from the large amount of noise still remains. In this study, individual PD pulses were filtered, extracted and analyzed using digital signal processing techniques and data mining methods. The shape or distribution of the spectral frequency domain could be correlated with different PD signals. Feature extraction was explored using K-means clustering to categorize the similarities. A hard threshold method was applied to the time domain in which the critical PD pulses could be identified based on extracted features. A pre-determined threshold value was set and PD occurrences could be found and classified for fault diagnosis.