2017
DOI: 10.1109/mm.2017.36
|View full text |Cite
|
Sign up to set email alerts
|

Piton: A Manycore Processor for Multitenant Clouds

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 17 publications
(7 citation statements)
references
References 0 publications
0
7
0
Order By: Relevance
“…GPUs feature thousands of compute units, but their utilization is limited by the SIMT regime. Especially for irregular and non-data-oblivious algorithms, thread divergence and RAW [24] 32-bit MIPS-style -16 ✓ ✓ ✓ × × × × × × Celerity [25] 32-bit RISC-V -496 * ✓ ✓ ✓ × × × ✓ ✓ ✓ KiloCore [26] 40-bit RISC -1000 ✓ ✓ ✓ × × × × × × Piton [27] 64-bit SPARC V9 -25 ✓ ✓ ✓ × × × ✓ ✓ ✓ TILE64 [28] 64-bit VLIW -64 ✓ ✓ ✓ × × × × × × Epiphany-V [29] 64-bit RISC -1024 ✓ ✓ ✓ × × × × × × Pixel Visual Core [6] 16-bit VLIW 256 2048 × × × × × × × × × crossbar-based GAP9 [11] 32-bit RISC-V 9 9 ✓ ✓ ✓ ✓ ✓ ✓ ≈ ≈ ≈ RC64 [13] 32-bit VLIW 64 64 ✓ ✓ ✓ ✓ ✓ ✓ × × × Manticore [14] 32-bit RISC-V 8 4096 ✓ ✓ ✓ × × × ✓ ✓ ✓ MPPA3 [12] 64-bit VLIW 16 80 ✓ ✓ ✓ × × × × × × ET-SoC-1 [15] 64-bit RISC-V 32 1088 ✓ ✓ ✓ × × × × × × H1000 [4] 32/64-bit PTX 128 18432 synchronization restrict the GPU's performance. In contrast, MemPool gives each PE its instruction stream, making it much more flexible and efficient on irregular workloads.…”
Section: Related Workmentioning
confidence: 99%
“…GPUs feature thousands of compute units, but their utilization is limited by the SIMT regime. Especially for irregular and non-data-oblivious algorithms, thread divergence and RAW [24] 32-bit MIPS-style -16 ✓ ✓ ✓ × × × × × × Celerity [25] 32-bit RISC-V -496 * ✓ ✓ ✓ × × × ✓ ✓ ✓ KiloCore [26] 40-bit RISC -1000 ✓ ✓ ✓ × × × × × × Piton [27] 64-bit SPARC V9 -25 ✓ ✓ ✓ × × × ✓ ✓ ✓ TILE64 [28] 64-bit VLIW -64 ✓ ✓ ✓ × × × × × × Epiphany-V [29] 64-bit RISC -1024 ✓ ✓ ✓ × × × × × × Pixel Visual Core [6] 16-bit VLIW 256 2048 × × × × × × × × × crossbar-based GAP9 [11] 32-bit RISC-V 9 9 ✓ ✓ ✓ ✓ ✓ ✓ ≈ ≈ ≈ RC64 [13] 32-bit VLIW 64 64 ✓ ✓ ✓ ✓ ✓ ✓ × × × Manticore [14] 32-bit RISC-V 8 4096 ✓ ✓ ✓ × × × ✓ ✓ ✓ MPPA3 [12] 64-bit VLIW 16 80 ✓ ✓ ✓ × × × × × × ET-SoC-1 [15] 64-bit RISC-V 32 1088 ✓ ✓ ✓ × × × × × × H1000 [4] 32/64-bit PTX 128 18432 synchronization restrict the GPU's performance. In contrast, MemPool gives each PE its instruction stream, making it much more flexible and efficient on irregular workloads.…”
Section: Related Workmentioning
confidence: 99%
“…The Piton processor prototype 12,13 was manufactured in March 2015 on IBM's 32 nm SOI process with a target clock frequency of 1GHz. It features 25 tiles in a 5 × 5 mesh on a 6mm × 6mm (36 mm 2 ) die.…”
Section: The Princeton Piton Processormentioning
confidence: 99%
“…OpenPiton enables research from the small to the large with demonstrated implementations from the slimmed-down, single-core PicoPiton, which is emulated on a $160 Xilinx Artix 7 at 29.5MHz, up to the 25-core Piton processor which targeted a 1GHz operating point and was recently validated and thoroughly characterized. 12,13 The OpenPiton platform shown in Figure 1 is a modern, tiled, manycore design consisting of a 64-bit architecture using the mature SPARC v9 ISA with P-Mesh: our scalable cache coherence protocol and network on chip (NoC). OpenPiton builds upon the industry-hardened, open-source OpenSPARC T1 15,1,18 core, but sports a completely scratchbuilt uncore (caches, cache-coherence protocol, NoCs, NoC-based I/O bridges, etc), a new and modern simulation framework, configurable and portable FPGA scripts, a complete set of scripts enabling synthesis and implementation of ready-to-manufacture chips, and full-stack multiuser Debian Linux support.…”
Section: Introductionmentioning
confidence: 99%
“…Similar to our work, Morpheus [30] has exploited lowered performance variance to improve cluster utilization, but through automated SLOs as opposed to market incentives. Finally, many solutions have been proposed at the architecture-level to enable better utilization of underlying cloud processors [7,47]. 2) Self-Adaptation and Graceful Degradation.…”
Section: Related Workmentioning
confidence: 99%