Optimizing tensor contractions for embedded devices with racetrack memory scratch-pads

Khan, Asif Ali; Rink, Norman A.; Hameed, Fazal; Castrillón, Jerónimo

doi:10.1145/3316482.3326351

Cited by 11 publications

(9 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The work in [138] investigates the layouts of highdimensional data structures such as tensors in RTM-based SPMs. For the tensor contraction operation, an optimized data layout reduces the number of shifts by 50% compared to a naïve layout.…”

Section: B Software Techniques For Minimizing Shiftmentioning

confidence: 99%

“…For instance, different RTM topologies and structures differ in their error patterns which need to be analyzed at the architectural level. Similarly, at the compiler level, the memory access patterns of applications can be reordered from higher compiler abstractions, e.g., from a polyhedral model or by additional semantic information from domain-specific languages [138]. There is a need to investigate a runtime system that is flexible to adapt to various flavors of the racetrack (single DW versus multiple DWs; horizontal versus vertical racetrack) memories and different application characteristics (latency versus bandwidthsensitive applications).…”

Section: Hw/sw Codesignmentioning

confidence: 99%

See 1 more Smart Citation

Magnetic Racetrack Memory: From Physics to the Cusp of Applications Within a Decade

et al. 2020

Self Cite

View full text Add to dashboard Cite

| Racetrack memory (RTM) is a novel spintronic memory-storage technology that has the potential to overcome fundamental constraints of existing memory and storage devices. It is unique in that its core differentiating feature is the movement of data, which is composed of magnetic domain walls (DWs), by short current pulses. This enables more data to be stored per unit area compared to any other current technologies. On the one hand, RTM has the potential for mass data storage with unlimited endurance using considerably less energy than today's technologies. On the other hand, RTM promises an ultrafast nonvolatile memory competitive with static random access memory (SRAM) but with a much smaller footprint. During the last decade, the discovery of novel physical mechanisms to operate RTM has led to a major enhancement in the efficiency with which nanoscopic, chiral DWs can be manipulated. New materials and artificially atomically engineered thin-film structures have been found to increase the speed and lower the threshold current with which the data bits can be manipulated. With these recent Manuscript

show abstract

Section: B Software Techniques For Minimizing Shiftmentioning

confidence: 99%

Section: Hw/sw Codesignmentioning

confidence: 99%

Magnetic Racetrack Memory: From Physics to the Cusp of Applications Within a Decade

et al. 2020

Self Cite

View full text Add to dashboard Cite

show abstract

“…If the local adjacency of a vertex with the left group is greater than that of the right group, then it is added to left group (cf. lines [12][13][14]. Otherwise, the vertex is added to the right group (cf.…”

Section: The Shiftsreduce Heuristicmentioning

confidence: 99%

“…The overall performance and energy benefits of (Chen, ShiftsReduce, and IGA-Ours) compared to OFU translate to (22.2%, 25.4%, and 31.7%) and (12.4%, 17.5%, and 26.4%), respectively. The suitability of RMs compared to other memory technologies such as SRAM, STT-MRAM, and DRAM has already been established [13,30,48].…”

Section: Summary Performance and Energy Analysismentioning

confidence: 99%

See 1 more Smart Citation

ShiftsReduce

Khan

Hameed

Blaesing

et al. 2019

ACM Trans. Archit. Code Optim.

Self Cite

View full text Add to dashboard Cite

Racetrack memories (RMs) have significantly evolved since their conception in 2008, making them a serious contender in the field of emerging memory technologies. Despite key technological advancements, the access latency and energy consumption of an RM-based system are still highly influenced by the number of shift operations. These operations are required to move bits to the right positions in the racetracks. This article presents data-placement techniques for RMs that maximize the likelihood that consecutive references access nearby memory locations at runtime, thereby minimizing the number of shifts. We present an integer linear programming (ILP) formulation for optimal data placement in RMs, and we revisit existing offset assignment heuristics, originally proposed for random-access memories. We introduce a novel heuristic tailored to a realistic RM and combine it with a genetic search to further improve the solution. We show a reduction in the number of shifts of up to 52.5%, outperforming the state of the art by up to 16.1%. CCS Concepts: • Mathematics of computing → Combinatorial optimization; • Hardware → Emerging technologies; • Software and its engineering → Compilers;

show abstract