GraphStep: A System Architecture for Sparse-Graph Algorithms

deLorimier, Michael; Kapre, Nachiket; Mehta, Neelesh B.; Rizzo, Dominic; Eslick, Ian; Rubin, Raphael; Uribe, Tomás E.; Knight, Thomas Jr.; DeHon, André

doi:10.1109/fccm.2006.45

Cited by 71 publications

(57 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A good amount of literature deals with the design of BFS solutions, either based on commodity processors [11], [12] or special purpose hardware [13], [14], [15], [16]. Some recent publications describe successful parallelization strategies of list ranking [17] and phylogenetic trees on the Cell BE [18].…”

Section: Introductionmentioning

confidence: 99%

Scalable Graph Exploration on Multicore Processors

Agarwal

Petrini

Pasetto³

et al. 2010

2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis

194

151

View full text Add to dashboard Cite

Abstract-Many important problems in computational sciences, social network analysis, security, and business analytics, are data-intensive and lend themselves to graph-theoretical analyses. In this paper we investigate the challenges involved in exploring very large graphs by designing a breadth-first search (BFS) algorithm for advanced multi-core processors that are likely to become the building blocks of future exascale systems. Our new methodology for large-scale graph analytics combines a highlevel algorithmic design that captures the machine-independent aspects, to guarantee portability with performance to future processors, with an implementation that embeds processorspecific optimizations. We present an experimental study that uses state-of-the-art Intel Nehalem EP and EX processors and up to 64 threads in a single system. Our performance on several benchmark problems representative of the power-law graphs found in real-world problems reaches processing rates that are competitive with supercomputing results in the recent literature. In the experimental evaluation we prove that our graph exploration algorithm running on a 4-socket Nehalem EX is (1) 2.4 times faster than a Cray XMT with 128 processors when exploring a random graph with 64 million vertices and 512 millions edges, (2) capable of processing 550 million edges per second with an R-MAT graph with 200 million vertices and 1 billion edges, comparable to the performance of a similar graph on a Cray MTA-2 with 40 processors and (3) 5 times faster than 256 BlueGene/L processors on a graph with average degree 50.

show abstract

Section: Introductionmentioning

confidence: 99%

Scalable Graph Exploration on Multicore Processors

Agarwal

Petrini

Pasetto³

et al. 2010

2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis

194

151

View full text Add to dashboard Cite

show abstract

“…The FPGA implementation scales well to, at least, tens of leaf processing FPGAs. See [deLorimier06] for further details on the Concept Net implementation. [Bellman58] is a single-source shortest path algorithm which robustly handles negative edge weights.…”

Section: Resultsmentioning

confidence: 99%

“…Parallel versions could, potentially, reduce the per processor working set; however, communication often ends up dominating computation due to high end-to-end network latency and high network contention. Our FPGA-based Graph Machine implementation is able to perform better because of the high memory bandwidth [deLorimier06] and low PE-to-PE latency. On a Virtex4-LX160-12, we are able to place16 double-precision floating-point PEs which operate at 285MHz each.…”

Section: Bellman-fordmentioning

confidence: 99%

See 1 more Smart Citation

The Design of a Polymorphous Cognitive Agent Architecture (PCAA)

Amduka¹,

Russo²,

Jha³

et al. 2008

Self Cite

View full text Add to dashboard Cite

Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden to Washington Headquarters Service, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302, and PCAA is a dynamic, adaptive cognitive architecture that makes previously intractable approximation tasks tractable for NP-hard cognitive problems. PCAA consists of: linear composable cognitive agents, a cognitive mark-up language for cognitive behavior definition, a cognitive layer for derivation of cognitive services and specialized cognitive agents, and a next generation polymorphic hardware and software layer for runtime composition and instantiation of cognitive agents. PCAA is a dynamic, adaptive cognitive architecture that makes previously intractable approximation tasks tractable for NP-hard cognitive problems. PCAA consists of: linear composable cognitive agents, a cognitive mark-up language for cognitive behavior definition, a cognitive layer for derivation of cognitive services and specialized cognitive agents, and a next generation polymorphic hardware and software layer for runtime composition and instantiation of cognitive agents. SPONSOR/MONITOR'S ACRONYM(S) 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES)AFRL SUBJECT TERMSOur approach included a comprehensive concept study in the context of representative DoD challenge problems that have a clear and well-defined need for ACIP technology. PCAA application experiments demonstrated clear performance improvements over traditional computing architectures for cognitive processing for these applications. Our innovations include:• Dynamically composable hardware and software with linear scalability for cognitive processing across a massively parallel hardware fabric for real time autonomous systems. • A dynamically composed agent architecture that partitions reactive and predefined behaviors into linear lower level cognitive agents that tailor and adapt the overall behavior of the computing architecture to immediate mission needs.• Run-time derived cognitive virtual machines to partition cognitive processing to a new generation of computing run-time configured hardware and software to allow for dynamic cognitive computing reconfiguration required to achieve reactive processing.Our research was driven by DoD applications that have demonstrated needs for diverse cognitive processing that cannot be addressed by current computing hardware and software architectures. We demonstrated our end-to-end approach for two applications with direct DoD relevance: control of autonomous Unmanned Aerial Vehicles and Intelligence Analysis.ii

show abstract

“…We schedule communication between the nodes using a greedy time-multiplexed router that uses A* routing. We developed this scheduler and router as part of the Graph Machine project [19], [28], [32].…”

Section: B Tool Flowmentioning

confidence: 99%

Accelerating SPICE Model-Evaluation using FPGAs

Kapre

DeHon

2009

2009 17th IEEE Symposium on Field Programmable Custom Computing Machines

Self Cite

View full text Add to dashboard Cite

Abstract-Single-FPGA spatial implementations can provide an order of magnitude speedup over sequential microprocessor implementations for data-parallel, floating-point computation in SPICE model-evaluation. Model-evaluation is a key component of the SPICE circuit simulator and it is characterized by large irregular floating-point compute graphs. We show how to exploit the parallelism available in these graphs on single-FPGA designs with a low-overhead VLIW-scheduled architecture. Our architecture uses spatial floating-point operators coupled to local high-bandwidth memories and interconnected by a time-shared network. We retime operation inputs in the model-evaluation to allow independent scheduling of computation and communication. With this approach, we demonstrate speedups of 2-18× over a dual-core 3GHz Intel Xeon 5160 when using a Xilinx Virtex 5 LX330T for a variety of SPICE device models.

show abstract

GraphStep: A System Architecture for Sparse-Graph Algorithms

Cited by 71 publications

References 37 publications

Scalable Graph Exploration on Multicore Processors

Scalable Graph Exploration on Multicore Processors

The Design of a Polymorphous Cognitive Agent Architecture (PCAA)

Accelerating SPICE Model-Evaluation using FPGAs

Contact Info

Product

Resources

About