Modern user-facing applications deployed in datacenters use a distributed system architecture that exacerbates the latency requirements of their constituent microservices (30-250µs). Existing CPU power-saving techniques degrade the performance of these applications due to the long transition latency (order of 100µs) to wake up from a deep CPU idle state (C-state). For this reason, server vendors recommend only enabling shallow core C-states (e.g., CC1) for idle CPU cores, thus preventing the system from entering deep package Cstates (e.g., PC6) when all CPU cores are idle. This choice, however, impairs server energy proportionality since powerhungry resources (e.g., IOs, uncore, DRAM) remain active even when there is no active core to use them. As we show, it is common for all cores to be idle due to the low average utilization (e.g., 5 − 20%) of datacenter servers running user-facing applications.We propose to reap this opportunity with AgilePkgC (APC), a new package C-state architecture that improves the energy proportionality of server processors running latencycritical applications. APC implements PC1A (package C1 agile), a new deep package C-state that a system can enter once all cores are in a shallow C-state (i.e., CC1) and has a nanosecond-scale transition latency. PC1A is based on four key techniques. First, a hardware-based agile power management unit (APMU) rapidly detects when all cores enter a shallow core C-state (CC1) and trigger the system-level power savings control flow. Second, an IO Standby Mode (IOSM) that places IO interfaces (e.g., PCIe, DMI, UPI, DRAM) in shallow (nanosecond-scale transition latency) low-power modes. Third, a CLM Retention (CLMR) rapidly reduces the CLM (Cache-and-home-agent, Last-level-cache, and Mesh network-on-chip) domain's voltage to its retention level, drastically reducing its power consumption. Fourth, APC keeps all system PLLs active in PC1A to allow nanosecond-scale exit latency by avoiding PLLs' re-locking overhead.Combining these techniques enables significant power savings while requiring less than 200ns transition latency, >250× faster than existing deep package C-states (e.g., PC6), making PC1A practical for datacenter servers. Our evaluation using Intel Skylake-based server shows that APC reduces the energy consumption of Memcached by up to 41% (25% on average) with <0.1% performance degradation. APC provides similar benefits for other representative workloads.