Data centers as a cost-effective infrastructure for hosting Cloud and Grid applications incur tremendous energy cost and CO2 emissions in terms of power distribution and cooling. One of the effective approaches for saving energy in a cluster environment is workload consolidation. However, it is challenging to address this schedule problem as it requires the understanding of various cost factors. One of the important factors is the estimation of power consumption.Power models used in most of workload schedule solutions are a linear function of resource features, but we analysed the measurement data from our cluster and found the resource loads, in particular I/O load, had no convincing linearcorrelation with power consumption. Based on measurement data sets from our cluster, we propose multiple non-linear machine learning approaches to estimate power consumption of an entire node using OS-reported resource features. We evaluate the accuracy, portability and usability of the linear and non-linear approaches. Our work shows the multiple-variable linear regression approach is more precise than the CPU only linear approach. The neural network approaches have a slight advantage -its mean root mean square error is at most 15% less than that of the multiple-variable linear approach. But the neural network models have worse portability when the models generated on a node are applied on its homogeneous nodes. Gaussian Mixture Model has the highest accuracy on Hadoop nodes but requires the longest training time.