Hydrogenerators are strategic assets for power utilities. Their reliability and availability can lead to significant benefits. For decades, monitoring and diagnosis of hydrogenerators have been at the core of maintenance strategies. A significant part of generator diagnosis relies on Partial Discharge (PD) measurements, because the main cause of hydrogenerator breakdown comes from failure of its high voltage stator, which is a major component of hydrogenerators. A study of all stator failure mechanisms reveals that more than 85 % of them involve the presence of PD activity. PD signal can be detected from the lead of the hydrogenerator while it is running, thus allowing for on-line diagnosis. Hydro-Québec has been collecting more than 33 000 unlabeled PD measurement files over the last decades. Up to now, this diagnostic technique has been quantified based on global PD amplitudes and integrated PD energy irrespective of the source of the PD signal. Several PD sources exist and they all have different relative risk, but in order to recognize the nature of the PD, or its source, the judgement of experts is required. In this paper, we propose a new method based on visual data analysis to build a PD source classifier with a minimum of labeled data. A convolutional variational autoencoder has been used to help experts to visually select the best training data set in order to improve the performances of the PD source classifier.