Forecasting energy demand has been a critical process in various decision support systems regarding consumption planning, distribution strategies, and energy policies. Traditionally, forecasting energy consumption or demand methods included trend analyses, regression, and auto-regression. With advancements in machine learning methods, algorithms such as support vector machines, artificial neural networks, and random forests became prevalent. In recent times, with an unprecedented improvement in computing capabilities, deep learning algorithms are increasingly used to forecast energy consumption/demand. In this contribution, a relatively novel approach is employed to use long-term memory. Weather data was used to forecast the energy consumption from three datasets, with an additional piece of information in the deep learning architecture. This additional information carries the causal relationships between the weather indicators and energy consumption. This architecture with the causal information is termed as entangled long short term memory. The results show that the entangled long short term memory outperforms the state-of-the-art deep learning architecture (bidirectional long short term memory). The theoretical and practical implications of these results are discussed in terms of decision-making and energy management systems.