The mass distribution of dark matter haloes is the result of the hierarchical growth of initial density perturbations through mass accretion and mergers. We use an interpretable machine-learning framework to provide physical insights into the origin of the spherically-averaged mass profile of dark matter haloes. We train a gradient-boosted-trees algorithm to predict the final mass profiles of cluster-sized haloes, and measure the importance of the different inputs provided to the algorithm. We find two primary scales in the initial conditions (ICs) that impact the final mass profile: the density at approximately the scale of the haloes' Lagrangian patch 𝑅 𝐿 (𝑅 ∼ 0.7 𝑅 𝐿 ) and that in the large-scale environment (𝑅 ∼ 1.7 𝑅 𝐿 ). The model also identifies three primary time-scales in the halo assembly history that affect the final profile: (i) the formation time of the virialized, collapsed material inside the halo, (ii) the dynamical time, which captures the dynamically unrelaxed, infalling component of the halo over its first orbit, (iii) a third, most recent time-scale, which captures the impact on the outer profile of recent massive merger events. While the inner profile retains memory of the ICs, this information alone is insufficient to yield accurate predictions for the outer profile. As we add information about the haloes' mass accretion history, we find a significant improvement in the predicted profiles at all radii. Our machine-learning framework provides novel insights into the role of the ICs and the mass assembly history in determining the final mass profile of cluster-sized haloes.