2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2017
DOI: 10.1109/ipdps.2017.116
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating Graph and Machine Learning Workloads Using a Shared Memory Multicore Architecture with Auxiliary Support for In-hardware Explicit Messaging

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 15 publications
(6 citation statements)
references
References 26 publications
0
6
0
Order By: Relevance
“…The latter condition is to ensure long-term fairness which voids starvation of threads from other nodes. Before the server becomes a normal thread, it first releases the global and local locks (the line [30][31]. If the MCS queue has other waiting server threads, the current server thread will hand over the ownership of the global lock to the very next server thread, which is responsible to proceed to handle its local requests.…”
Section: Implementation Detailsmentioning
confidence: 99%
See 1 more Smart Citation
“…The latter condition is to ensure long-term fairness which voids starvation of threads from other nodes. Before the server becomes a normal thread, it first releases the global and local locks (the line [30][31]. If the MCS queue has other waiting server threads, the current server thread will hand over the ownership of the global lock to the very next server thread, which is responsible to proceed to handle its local requests.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…MCSTP, 19 MCSCR, 1 and CST 23 address the preempting issue by employing sleep and wakeup approach. pLock 30 is a variant of an explicit intercore message passing (EMP)‐based lock 31 augmented with chaining and hierarchical features. The basic concept for an EMP‐based lock is to use a dedicated core as a server, and other cores as clients that request a lock from the server.…”
Section: Related Workmentioning
confidence: 99%
“…The architecture requires a receive queue per core to support the proposed protocol, as seen in Figure 1. The size of each core's receive queue is determined empirically by conducting a study similar to the one presented in Dogan et al (2017). All workloads are run, and a counter is utilized in the simulator to determine maximum utilization of receive queues at any given time for each workload.…”
Section: Explicit Messaging Hardware Overheadmentioning
confidence: 99%
“…Ham et al [23] proposed domain-specific Graphicionado, which exploits the data structure-centric datapath specialization and memory subsystem specialization. Dogan et al [22] proposed a shared memory multi-core architecture. By introducing hardware-level messaging instructions into ISA, this design can accelerate synchronization primitives and move computation towards data more efficiently.…”
Section: Graph Acceleration Architecturementioning
confidence: 99%
“…These hardware-level works need extensional devices for acceleration; thus, the overhead is larger than CGAcc because CGAcc is deployed in HMC, and HMC can be treated as the main memory system in a computer system. Some software-level works optimized graph processing by enriching the instruction set architecture [22] or customizing the compiler [23]. Software-level works cannot make full use of hardware.…”
Section: Introductionmentioning
confidence: 99%