<p>The rapid development of deep learning has propelled many real-world artificial intelligence (AI) applications. Many of these applications integrate multiple neural network (multi-NN) models to cater to various functionalities. Although a number of multi-NN acceleration technologies have been explored, few can fully fulfill the flexibility and scalability required by emerging and diverse AI workloads, especially for mobile. Among these, homogeneous multi-core architectures have great potential to support multi-NN execution by leveraging decentralized parallelism and intrinsic scalability. However, the advantages of multi-core systems are underexploited due to the adoption of bulk synchronization parallelism (BSP), which is inefficient to meet the diversity of multi-NN workloads. This paper reports a hierarchical multi-core architecture with asynchronous parallelism to enhance multi-NN execution for higher performance and utilization. Hierarchical asynchronous parallel (HASP) is the theoretical foundation, which establishes a programmable and grouped dynamic synchronous-asynchronous framework for multi-NN acceleration. HASP can be implemented on a typical multi-core processor for multi-NN with minor modifications. We further developed a prototype chip to validate the hardware effectiveness of this design. A mapping strategy that combines spatial partitioning and temporal tuning is also developed, which allows the proposed architecture to promote resource utilization and throughput simultaneously.</p>
<p>The rapid development of deep learning has propelled many real-world artificial intelligence (AI) applications. Many of these applications integrate multiple neural network (multi-NN) models to cater to various functionalities. Although a number of multi-NN acceleration technologies have been explored, few can fully fulfill the flexibility and scalability required by emerging and diverse AI workloads, especially for mobile. Among these, homogeneous multi-core architectures have great potential to support multi-NN execution by leveraging decentralized parallelism and intrinsic scalability. However, the advantages of multi-core systems are underexploited due to the adoption of bulk synchronization parallelism (BSP), which is inefficient to meet the diversity of multi-NN workloads. This paper reports a hierarchical multi-core architecture with asynchronous parallelism to enhance multi-NN execution for higher performance and utilization. Hierarchical asynchronous parallel (HASP) is the theoretical foundation, which establishes a programmable and grouped dynamic synchronous-asynchronous framework for multi-NN acceleration. HASP can be implemented on a typical multi-core processor for multi-NN with minor modifications. We further developed a prototype chip to validate the hardware effectiveness of this design. A mapping strategy that combines spatial partitioning and temporal tuning is also developed, which allows the proposed architecture to promote resource utilization and throughput simultaneously.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.