Artificial intelligence (AI)-driven zero-touch massive network slicing is envisioned to be a disruptive technology in beyond 5G (B5G)/6G, where tenancy would be extended to the final consumer in the form of advanced digital use-cases. In this paper, we propose a novel model-free deep reinforcement learning (DRL) framework, called collaborative statistical Actor-Critic (CS-AC) that enables a scalable and farsighted slice performance management in a 6G-like RAN scenario that is built upon mobile edge computing (MEC) and massive multiple-input multipleoutput (mMIMO). In this intent, the proposed CS-AC targets the optimization of the latency cost under a long-term statistical service-level agreement (SLA). In particular, we consider the Q-th delay percentile SLA metric and enforce some slice-specific preset constraints on it. Moreover, to implement distributed learners, we propose a developed variant of soft Actor-Critic (SAC) with less hyperparameter sensitivity. Finally, we present numerical results to showcase the gain of the adopted approach on our built OpenAIbased network slicing environment and verify the performance in terms of latency, SLA Q-th percentile, and time efficiency. To the best of our knowledge, this is the first work that studies the feasibility of an AI-driven approach for massive network slicing under statistical SLA.