GVSoC: A Highly Configurable, Fast and Accurate Full-Platform Simulator for RISC-V based IoT Processors

Bruschi, Nazareno; Haugou, Germain; Tagliavini, Giuseppe; Conti, Francesco; Benini, Luca; Rossi, Davide

doi:10.1109/iccd53106.2021.00071

Cited by 27 publications

(13 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We model the many-tile system architecture extending the GVSoC [16] platform, an accurate timing simulator, enabling support for multiple (up to 16) CLs, and extending the interconnect to model conflicts between multiple CLs. As such, in a scenario where pipelining among CLs is exploited, only a few conflicts are present because communication is primarily point-to-point among CLs in the pipeline.…”

Section: Simulation Methodologymentioning

confidence: 99%

Scale up your In-Memory Accelerator: Leveraging Wireless-on-Chip Communication for AIMC-based CNN Inference

Bruschi

Tagliavini

Conti

et al. 2022

2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

Self Cite

View full text Add to dashboard Cite

In-Memory Computing (AIMC) is emerging as a disruptive paradigm for heterogeneous computing, potentially delivering orders of magnitude better peak performance and efficiency over traditional digital signal processing architectures on Matrix-Vector multiplication. However, to sustain this throughput in real-world applications, AIMC tiles must be supplied with data at very high bandwidth and low latency; this poses an unprecedented pressure on the on-chip communication infrastructure, which becomes the system's performance and efficiency bottleneck. In this context, the performance and plasticity of emerging on-chip wireless communication paradigms provide the required breakthrough to up-scale on-chip communication in large AIMC devices. This work presents a many-tile AIMC architecture with inter-tile wireless communication that integrates multiple heterogeneous computing clusters, embedding a mix of parallel RISC-V cores and AIMC tiles. We perform an extensive design space exploration of the proposed architecture and discuss the benefits of exploiting emerging on-chip communication technologies such as wireless transceivers in the millimeter-wave and terahertz bands.

show abstract

Section: Simulation Methodologymentioning

confidence: 99%

Scale up your In-Memory Accelerator: Leveraging Wireless-on-Chip Communication for AIMC-based CNN Inference

Bruschi

Tagliavini

Conti

et al. 2022

2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Therefore, to benchmark the computing capabilities of the cluster on real-life end-to-end DNN models, we exploit our previous experience on explicit memory management, data tiling techniques [25] and on the deployment of real-sized DNN models on application chips such as Vega [23] to build a model of the system, with larger L2 memory, on which we run the experiments. The hardwareoriented description of the SoC is integrated into our opensource 2 event-based emulator, called GVSOC [33]; to run the experiments, the following measurements and considerations are taken: the GVSOC; as expected, during the execution of the inference task we are never memory-bound; therefore, the contribution of the L2 to L1 (and vice-versa) data movements is relevant only for the total energy consumption; 3) We conduct silicon measurements, in terms of latency and energy, on all the L2 to L1 data transfers (and viceversa) necessary to compute each tile and determined by the GVSOC simulations; we then include the measurements in the model; 4) We conduct silicon measurements, in terms of latency and the energy, on all the kernels necessary to compute each tile generated by the Dory framework; we then include the measurements in the model. The layer-wise compute time and energy of the inference task are shown in Figure 14.…”

Section: B End-to-end Mobilenetv2mentioning

confidence: 99%

DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training

Garofalo

Tortorella

Perotti³

et al. 2022

IEEE Open J. Solid-State Circuits Soc.

Self Cite

View full text Add to dashboard Cite

On-chip DNN inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy and flexibility requirements. Heterogeneous clusters are promising solutions to meet the challenge, combining the flexibility of DSP-enhanced cores with the performance and energy boost of dedicated accelerators. We present DARKSIDE, a Systemon-Chip with a heterogeneous cluster of 8 RISC-V cores enhanced with 2-b to 32-b mixed-precision integer arithmetic. To boost performance and efficiency on key compute-intensive Deep Neural Network (DNN) kernels, the cluster is enriched with three digital accelerators: a specialized engine for low-data-reuse depthwise convolution kernels (up to 30 MAC/cycle); a minimal overhead datamover to marshal 1-b to 32-b data on-the-fly; a 16b floating point Tensor Product Engine (TPE) for tiled matrixmultiplication acceleration. DARKSIDE is implemented in 65nm CMOS technology. The cluster achieves a peak integer performance of 65 GOPS and a peak efficiency of 835 GOPS/W when working on 2-b integer DNN kernels. When targeting floatingpoint tensor operations, the TPE provides up to 18.2 GFLOPS of performance or 300 GFLOPS/W of efficiency -enough to enable on-chip floating-point training at competitive speed coupled with ultra-low power quantized inference.

show abstract

“…Dentro dele, existe uma ferramenta chamada GVSoC [Bruschi et al 2021], que foi projetada especificamente para emular instruc ¸ões de RISC-V, tanto quanto seu comportamento. Essa ferramenta está em desenvolvimento e funciona em uma versão específica do GCC (GNU Compiler Collection [Projeto GNU 2022]).…”

Section: Pulp Sdkunclassified

“…Um software de código aberto chamado PULP-SDK [PULP Platform 2020] foi construído com o intuito de escolher as melhores ferramentas para emulac ¸ão e virtualizac ¸ão de processadores RISC-V. Bruschi et al [Bruschi et al 2021] apresentaram uma proposta de software chamada GVSoC, que é totalmente modificável, rápido e preciso na hora de virtualizar esse processador. Por conta disso, foi incluído no PULP e foi utilizado neste artigo.…”

Section: Introduc ¸ãOunclassified

Port do Sistema Operacional Nanvix para Arquitetura RISC-V Plataforma PULP

Oliveira

Nogueira²,

Freitas³

2022

Anais Estendidos Do XXIII Simpósio Em Sistemas Computacionais De Alto Desempenho (SSCAD Estendido 2022)

View full text Add to dashboard Cite

Sistemas operacionais são desenvolvidos com o intuito de possuir a melhor otimização e funcionamento para diversas arquiteturas de processadores. A implementação é um processo delicado, já que diversas tecnologias diferentes emergem constantemente. Neste trabalho, é apresentado o desenvolvimento do port do sistema operacional Nanvix para uma arquitetura RISC-V, que vai ser emulada a partir de uma ferramenta denominada PULP SDK. Para verificar o funcionamento do port, testes automatizados foram realizados, desde o nível de kernel ao de usuário.

show abstract

GVSoC: A Highly Configurable, Fast and Accurate Full-Platform Simulator for RISC-V based IoT Processors

Cited by 27 publications

References 16 publications

Scale up your In-Memory Accelerator: Leveraging Wireless-on-Chip Communication for AIMC-based CNN Inference

Scale up your In-Memory Accelerator: Leveraging Wireless-on-Chip Communication for AIMC-based CNN Inference

DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training

Port do Sistema Operacional Nanvix para Arquitetura RISC-V Plataforma PULP

Contact Info

Product

Resources

About