This paper describes design features of the high-performance CPU from a heterogeneous tri-cluster, deca-core CPU subsystem incorporated into the Helio X20 mobile SoC for smartphone applications. The SoC is fabricated in a 20nm high-κ metal-gate CMOS, and has a die size of 100mm 2 . Additional key features of the SoC include: a graphics processor unit, multimedia (including 32MPixel/24fps camera support), and connectivity subsystems integrating 802.11ac, GPS, and multistandard cellular modems, featuring LTE FTD/TDD R11 Cat-6 with 20+20 carrier aggregation (300/50Mb/s) DC-HSPA+, TD-SCDMA, Edge, CDMA2000 1x/EVDO Rev. A (SRLTE).As shown in Fig. 4.3.1, the deca-core compute function contains three separate clusters of ARMv8a CPUs. A first cluster contains four power-efficient Cortex-A53 cores optimized for ultra-low power (ULP) applications, while achieving a maximum frequency of 1.4GHz. A second cluster contains four Cortex-A53 cores optimized for higher performance, at 2GHz, but also maintaining low power (LP). While the LP cluster has a higher power/operation than the ULP cluster, the LP cluster still maintains a power efficiency advantage over a third high-performance (HP) cluster, which contains two Cortex-A72 cores, operating at a maximum frequency of 2.5GHz, featuring out-of-order execution, and a 1MB L2 cache. Heterogeneous multi-processing (HMP) is further extended from quad-core [1], octa-core [2] [3], to the tri-cluster, deca-core CPU subsystem with automatic adjustment of CPU resources according to the system workload. A die photograph ( Fig. 4.3.7) highlights the three clusters.A plot of power vs. single-thread CPU performance for all clusters is shown in Fig. 4.3.2. In contrast to 2-cluster approaches, the newly introduced LP cluster extends ULP-CPU power-efficiency benefits upward by 40% in performance (Fig. 4.3.2). An enhanced HMP cluster migration mechanism for tri-cluster balances performance and power for optimal adaptation to different system workloads. Moreover, the LP cluster achieves 40% better power efficiency for applications requiring a performance level that previously can only be fulfilled by the HP cluster in a dual-cluster CPU computing subsystem. Power/frequency optimization targets for the three clusters are designed to provide a continuous operation for a 10× dynamic performance range, but only at up to 4× power difference.Adaptive power allocation (APA) is used to maximize CPU performance within the currently allocated power budget by instantly re-allocating power from low-activity CPUs to high-activity CPUs, thus avoiding performance throttling on high-activity CPUs. In scenarios where the cumulative power of all CPUs exceeds the total cluster budget, automatic clock gating is introduced as a temporary countermeasure and is achieved by clock-dithering 'APA-CD'. When APA-CD is active, a secondary process adjusts the on-chip PLL frequency and off-chip DC-to-DC converter voltage to a more energy-efficient operating point in order to maximize performance (MP); this process is called APA...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.