Short-term load forecasting is a key task for planning and stability of the current and future distribution grid, as it can significantly contribute to the management of energy market for ancillary services. In this paper we introduce the beneficial properties of applications of sparse representation and corresponding dictionary learning to the net load forecasting problem on a substation level. In this context, sparse representation theory can provide parsimonial predictive models, which become attractive mainly due to their ability to successfully model the input space in a self-learning manner, by interacting between theory, algorithms, and applications. Several techniques are implemented, incorporating numerous dictionary learning and sparse decomposition algorithms, and a hierarchical structured model is proposed. The concept of sparsity in each case is embedded throughout the utilization of different regularization forms which include the 0 , 1 , 2 and tree 0 norms. The observed superiority of the proposed theory, especially the one which embeds the atoms and corresponding coefficients in a tree structure, stems from the construction of the dictionary so as to represent efficiently the ambient electricity signal space and the consequent extraction of sparse basis-vectors. The performance of each model is evaluated using real hourly load measurements from a high voltage/medium voltage (HV/MV) substation and compared with that of widely used machine learning methods. The provided analytical results, verify the effectiveness of hierarchical sparse representation in short-term load forecasting applications, in terms of common accuracy indices.INDEX TERMS Generative models, hierarchical dictionaries, load forecasting, power grid, sparse representation.