In this paper, we address the challenge of analyzing simulation data on HPC systems by using Apache Spark, which is a Big Data framework. One of the main problems we encountered with using Spark on HPC systems is the ephemeral data explosion, which is brought about by the curse of persistence in the Spark framework. Data persistence is essential in reducing I/O, but it comes at the cost of storage space. We show that in some cases, Spark scratch data can consume an order of magnitude more space than the input data being analyzed, leading to fatal out-of-disk errors. We investigate the real-world application of scaling machine learning algorithms to predict and analyze failures in multi-physics simulations on 76TB of data (over one trillion training examples). This problem is 2-3 orders of magnitude larger than prior work. Based on extensive experiments at scale, we provide several concrete recommendations as state-of-the-practice, and demonstrate a 7x reduction in disk utilization with negligible increases or even decreases in runtime.