Network Function (NF) deployments on commodity servers have become ubiquitous in datacenters and enterprise settings. Many commonly used NFs such as firewalls, load balancers and NATs are shallow-i.e., they only examine the packet's header, despite the entire packet being transferred on and off the server. As a result, the gap between moved and inspected data when handling large packets exceeds 20×. At modern network rates, such excess data movement is detrimental to performance, hurting both the average and 90% tail latency of large packets by up to 1.7×. Our thorough performance analysis identifies high contention on the NIC-server PCIe interface and in the server's memory hierarchy as the main bottlenecks.We introduce NFSlicer, a data movement optimization implemented as a NIC extension to mitigate the bottlenecks stemming from data movement deluge in deployments of shallow NFs on commodity servers. NFSlicer only transfers the small portion of each packet that the deployed NFs actually inspect, by slicing the packet's payload and temporarily storing it in on-NIC memory. When the server later transmits the processed packet, NFSlicer splices it to its previously sliced payload. We develop a software-based emulation platform and demonstrate that NFSlicer effectively minimizes data movement between the NIC and the server, bridging the latency gap between small and large packet NF processing. On a range of shallow NFs handling 1518B packets, NFSlicer reduces average and 90% tail latency by up to 17% / 29%, respectively.