Progress in Earth system science is accelerating rapidly, due to the increasing availability of multivariate datasets, often global, with moderate to high spatio-temporal resolutions. Turning these data into knowledge presents interoperability, technical, analytical, and other challenges. Earth System Data Cubes (ESDCs) have surfaced as essential tools, offering analysis-ready, cloud-optimised multivariate solutions. Coupled with advancements in Artificial Intelligence (AI), these solutions have the potential to release a wealth of information from the vast amounts of data that they contain. The application of AI methods to ESDCs promises to unpick the complexities of the Earth system, learning the underlying non-linearities to forecast its spatio-temporal behaviour. However, naive applications of such methods might lead to wrong conclusions and predictions. In this perspective paper, we discuss the methodological and conceptual challenges that AI applications of ESDCs bring. Particular risks are naive applications that ignore intrinsic properties of the Earth system, such as spatio-temporal auto-correlation issues that may deliver highly accurate but flawed predictions. Other applications may ignore known causal structures of Earth system dynamics. We also face technical challenges, such as adequate sampling strategies in ESDCs. Furthermore, documenting data cube provenance is essential to ensure end-to-end reproducible workflows. Effective visualisation tools are required to enable users to quickly navigate terabytes of data and develop an intuition for spatio-temporal dynamics encoded in these cubes. Given this, we aim to synthesise the main challenges and derive an agenda for advancing data science on data cubes to better understand global Earth system processes.