This paper describes design features of the high-performance CPU from a heterogeneous tri-cluster, deca-core CPU subsystem incorporated into the Helio X20 mobile SoC for smartphone applications. The SoC is fabricated in a 20nm high-κ metal-gate CMOS, and has a die size of 100mm 2 . Additional key features of the SoC include: a graphics processor unit, multimedia (including 32MPixel/24fps camera support), and connectivity subsystems integrating 802.11ac, GPS, and multistandard cellular modems, featuring LTE FTD/TDD R11 Cat-6 with 20+20 carrier aggregation (300/50Mb/s) DC-HSPA+, TD-SCDMA, Edge, CDMA2000 1x/EVDO Rev. A (SRLTE).As shown in Fig. 4.3.1, the deca-core compute function contains three separate clusters of ARMv8a CPUs. A first cluster contains four power-efficient Cortex-A53 cores optimized for ultra-low power (ULP) applications, while achieving a maximum frequency of 1.4GHz. A second cluster contains four Cortex-A53 cores optimized for higher performance, at 2GHz, but also maintaining low power (LP). While the LP cluster has a higher power/operation than the ULP cluster, the LP cluster still maintains a power efficiency advantage over a third high-performance (HP) cluster, which contains two Cortex-A72 cores, operating at a maximum frequency of 2.5GHz, featuring out-of-order execution, and a 1MB L2 cache. Heterogeneous multi-processing (HMP) is further extended from quad-core [1], octa-core [2] [3], to the tri-cluster, deca-core CPU subsystem with automatic adjustment of CPU resources according to the system workload. A die photograph ( Fig. 4.3.7) highlights the three clusters.A plot of power vs. single-thread CPU performance for all clusters is shown in Fig. 4.3.2. In contrast to 2-cluster approaches, the newly introduced LP cluster extends ULP-CPU power-efficiency benefits upward by 40% in performance (Fig. 4.3.2). An enhanced HMP cluster migration mechanism for tri-cluster balances performance and power for optimal adaptation to different system workloads. Moreover, the LP cluster achieves 40% better power efficiency for applications requiring a performance level that previously can only be fulfilled by the HP cluster in a dual-cluster CPU computing subsystem. Power/frequency optimization targets for the three clusters are designed to provide a continuous operation for a 10× dynamic performance range, but only at up to 4× power difference.Adaptive power allocation (APA) is used to maximize CPU performance within the currently allocated power budget by instantly re-allocating power from low-activity CPUs to high-activity CPUs, thus avoiding performance throttling on high-activity CPUs. In scenarios where the cumulative power of all CPUs exceeds the total cluster budget, automatic clock gating is introduced as a temporary countermeasure and is achieved by clock-dithering 'APA-CD'. When APA-CD is active, a secondary process adjusts the on-chip PLL frequency and off-chip DC-to-DC converter voltage to a more energy-efficient operating point in order to maximize performance (MP); this process is called APA...