2018 IEEE International Symposium on High Performance Computer Architecture (HPCA) 2018
DOI: 10.1109/hpca.2018.00057
|View full text |Cite
|
Sign up to set email alerts
|

SmarCo: An Efficient Many-Core Processor for High-Throughput Applications in Datacenters

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 15 publications
(6 citation statements)
references
References 32 publications
0
6
0
Order By: Relevance
“…In October 2018, the Institute of Computing Technology, Chinese Academy of Sciences (ICT) launched the first High-Throughput computer at the China National Computer Congress (CNCC). High-throughput computer is integrated with a series of innovative technologies of ICT, including High-throughput many-core architecture (Fan et al 2018), High-Throughput on-chip data paths and Labeled von Neumann Architecture (Dongrui et al 2019). Each processing core adopts variable-length pipeline and multi-transmission structure, which significantly improves the instruction throughput efficiency.…”
Section: High Throughput Clustersmentioning
confidence: 99%
“…In October 2018, the Institute of Computing Technology, Chinese Academy of Sciences (ICT) launched the first High-Throughput computer at the China National Computer Congress (CNCC). High-throughput computer is integrated with a series of innovative technologies of ICT, including High-throughput many-core architecture (Fan et al 2018), High-Throughput on-chip data paths and Labeled von Neumann Architecture (Dongrui et al 2019). Each processing core adopts variable-length pipeline and multi-transmission structure, which significantly improves the instruction throughput efficiency.…”
Section: High Throughput Clustersmentioning
confidence: 99%
“…processor consists of processing elements (PEs), data buffers (Dbufs), instruction buffers (Cbufs), a micro-controller unit (MicC), and a DMA controller. The structure of SPU is similar to traditional many-core architectures (Fan et al 2012(Fan et al , 2018 but every PE inside the array is a fine-grained dataflow processing element not a control-flow core. The Dbufs and Cbufs are implemented by using Scratch-Pad Memory (SPM).…”
Section: Spu Structurementioning
confidence: 99%
“…There are several reasons that cause high memory access latency in convolution operations. In the case of DDR memory type, the main methods of reducing the latency of memory access problems of convolutional operations include increasing the utilization rate of on-chip buffer resources [11] [12], using more off-chip memory [13] [14] or improving the utilization of off-chip memory [12], etc.…”
Section: Introductionmentioning
confidence: 99%