This paper describes a study on applying data mining techniques to power transformer failure prediction. The data set used consisted not only on DGA tests, but also in other tests done to the transformer’s insulating oil. This dataset presented several challenges, such as highly imbalanced classes (common in failure prediction problems), and the temporal nature of the observations.To overcome these challenges, several techniques were applied for prediction and better understand the dataset. Pre-processing and temporality incorporation in the dataset is discussed. For prediction, a 1-class and 2-class SVM, decision trees and random forests, as well as a LSTM neural network were applied to the dataset.As the prediction performance was low (high false-positive rate), we conducted a test to ascertain if the amount of data collected was sufficient. Results indicate that the frequency of data collection was not adequate, hinting that the degradation period was shorter than the periodicity of data collection.