An evaluation of the TRIPS computer system

Gebhart, Mark; Maher, Bertrand A.; Coons, Katherine E.; Diamond, Jeff; Gratz, Paul V.; Marino, Mario Donato; Ranganathan, Nitya; Robatmili, Behnam; Smith, A. Gordon; Burrill, James; Keckler, Stephen W.; Burger, Doug; McKinley, Kathryn S.

doi:10.1145/2528521.1508246

Cited by 21 publications

(25 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In fact, parallelism breaks the computation into finer-grain chunks on separate regions. This reduces global memory accesses by leveraging local storage in regions and optional local memories, which is analogous to spatial computing approaches in classical computing [14,45,46].…”

Section: The Multi-simd Architectural Modelmentioning

confidence: 99%

Compiler Management of Communication and Parallelism for Quantum Computation

Heckey

Patil

Javadi-Abhari

et al. 2015

Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems

View full text Add to dashboard Cite

Quantum computing (QC) offers huge promise to accelerate a range of computationally intensive benchmarks. Quantum computing is limited, however, by the challenges of decoherence: i.e., a quantum state can only be maintained for short windows of time before it decoheres. While quantum error correction codes can protect against decoherence, fast execution time is the best defense against decoherence, so efficient architectures and effective scheduling algorithms are necessary. This paper proposes the Multi-SIMD QC architecture and then proposes and evaluates effective schedulers to map benchmark descriptions onto Multi-SIMD architectures. The Multi-SIMD model consists of a small number of SIMD regions, each of which may support operations on up to thousands of qubits per cycle.Efficient Multi-SIMD operation requires efficient scheduling. This work develops schedulers to reduce communication requirements of qubits between operating regions, while also improving parallelism.We find that communication to global memory is a dominant cost in QC. We also note that many quantum benchmarks have long serial operation paths (although each operation may be data parallel). To exploit this characteristic, we introduce LongestPath-First Scheduling (LPFS) which pins operations to SIMD regions to keep data in-place and reduce communication to memory.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. ASPLOS '15, March 14-18, 2015, Istanbul, Turkey. Copyright c 2015 ACM 978-1-4503-2835-7/15/03. . . $15.00. http://dx.doi.org/10.1145 The use of small, local scratchpad memories also further reduces communication. Our results show a 3% to 308% improvement for LPFS over conventional scheduling algorithms, and an additional 3% to 64% improvement using scratchpad memories. Our work is the most comprehensive software-to-quantum toolflow published to date, with efficient and practical scheduling techniques that reduce communication and increase parallelism for full-scale quantum code executing up to a trillion quantum gate operations.

show abstract

Section: The Multi-simd Architectural Modelmentioning

confidence: 99%

Compiler Management of Communication and Parallelism for Quantum Computation

Heckey

Patil

Javadi-Abhari

et al. 2015

Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems

View full text Add to dashboard Cite

show abstract

“…Second, the latency cost of explicit data communication can be prohibitive [3]. Third, compilation challenges have proven hard to surmount [9]. Overall, dataflow machines researched and implemented thus far have failed to provide higher instruction-level parallelism, and their theoretical promise of low power and yet high performance remains unrealized for irregular codes.…”

Section: Figure 3: Potential Of Ideal Explicit-dataflow Specializationmentioning

confidence: 99%

Exploring the potential of heterogeneous von neumann/dataflow execution models

NowatzkiTony¹,

GangadharVinay²,

SankaralingamKarthikeyan³

2015

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

General purpose processors (GPPs), from small inorder designs to many-issue out-of-order, incur large power overheads which must be addressed for future technology generations. Major sources of overhead include structures which dynamically extract the data-dependence graph or maintain precise state. Considering irregular workloads, current specialization approaches either heavily curtail performance, or provide simply too little benefit. Interestingly, well known explicit-dataflow architectures eliminate these overheads by directly executing the data-dependence graph and eschewing instruction-precise recoverability. However, even after decades of research, dataflow architectures have yet to come into prominence as a solution. We attribute this to a lack of effective control speculation and the latency overhead of explicit communication, which is crippling for certain codes.This paper makes the observation that if both out-of-order and explicit-dataflow were available in one processor, many types of GPP cores can benefit from dynamically switching during certain phases of an application's lifetime. Analysis reveals that an ideal explicit-dataflow engine could be profitable for more than half of instructions, providing significant performance and energy improvements. The challenge is to achieve these benefits without introducing excess hardware complexity. To this end, we propose the Specialization Engine for Explicit-Dataflow (SEED). Integrated with an inorder core, we see 1.67× performance and 1.65× energy benefits, with an Out-Of-Order (OOO) dual-issue core we see 1.33× and 1.70×, and with a quad-issue OOO, 1.14× and 1.54×.

show abstract

“…The following is our solution procedure, where numbers refer to constraints from the formulation: min SV C s.t. [ 1,2,3,4,5,6,7,8,10,14,15,16,17,18,21,22,23] min LAT s.t. [ 1,2,3,4,5,6,7,8,10,14,15,16,17,18,21,22,23] and SV C = SV C optimal…”

Section: Rfmentioning

confidence: 99%

A general constraint-centric scheduling framework for spatial architectures

Nowatzki

Sartin-Tarm

Carli

et al. 2013

Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation

Self Cite

View full text Add to dashboard Cite

Specialized execution using spatial architectures provides energy efficient computation, but requires effective algorithms for spatially scheduling the computation. Generally, this has been solved with architecture-specific heuristics, an approach which suffers from poor compiler/architect productivity, lack of insight on optimality, and inhibits migration of techniques between architectures.Our goal is to develop a scheduling framework usable for all spatial architectures. To this end, we expresses spatial scheduling as a constraint satisfaction problem using Integer Linear Programming (ILP). We observe that architecture primitives and scheduler responsibilities can be related through five abstractions: placement of computation, routing of data, managing event timing, managing resource utilization, and forming the optimization objectives. We encode these responsibilities as 20 general ILP constraints, which are used to create schedulers for the disparate TRIPS, DySER, and PLUG architectures. Our results show that a general declarative approach using ILP is implementable, practical, and typically matches or outperforms specialized schedulers.

show abstract

An evaluation of the TRIPS computer system

Cited by 21 publications

References 24 publications

Compiler Management of Communication and Parallelism for Quantum Computation

Compiler Management of Communication and Parallelism for Quantum Computation

Exploring the potential of heterogeneous von neumann/dataflow execution models

A general constraint-centric scheduling framework for spatial architectures

Contact Info

Product

Resources

About