The concept of network slicing enables operators to provision multiple virtual networks on top of a single (shared) physical infrastructure. Adding elasticity to slicing, i.e., the ability to on-demand provision/release dedicated network resources, improves resource utilization. However, efficiently allocating and scaling slice resources, while maintaining state consistency, is challenging. Especially with P4-programmable network devices that process packets at Tbps speeds, controller-driven scaling of network functions would be too time-consuming, and data-plane scaling is needed.In this paper, we address this need, by developing a custom scaling protocol and framework that can consistently, with negligible delay, scale network slices and functions transparently to the slice end-users. We compare, via emulation and experiments on programmable hardware, our approach to state-of-the-art scaling techniques and demonstrate significant slice resource utilization improvements and scaling duration reductions.