An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

Gómez-Luna, Juan; Guo, Yuxin; Brocard, Sylvan; Legriel, Julien; Cimadomo, Remy; Oliveira, Geraldo F.; Singh, Gagandeep; Mutlu, Onur

doi:10.48550/arxiv.2207.07886

Cited by 3 publications

(2 citation statements)

References 98 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Compared to inference, training is by far more compute and memory intensive and therefore the major challenge to address in consumer devices. The training procedure of Neural Networks (NNs), in fact, relies on BP [21] which is memory-bounded [40]. This is due to the storage of NN's activations, resulting in slow and energy-consuming operations.…”

Section: On-device Learningmentioning

confidence: 99%

Forward Learning of Large Language Models by Consumer Devices

Pau,

Aymone

2024

Electronics

View full text Add to dashboard Cite

Large Language Models achieve state of art performances on a broad variety of Natural Language Processing tasks. In the pervasive IoT era, their deployment on edge devices is more compelling than ever. However, their gigantic model footprint has hindered on-device learning applications which enable AI models to continuously learn and adapt to changes over time. Back-propagation, in use by the majority of deep learning frameworks, is computationally intensive and requires storing intermediate activations into memory to cope with the model’s weights update. Recently, “Forward-only algorithms” have been proposed since they are biologically plausible alternatives. By applying more “forward” passes, this class of algorithms can achieve memory reductions with respect to more naive forward-only approaches and by removing the need to store intermediate activations. This comes at the expense of increased computational complexity. This paper considered three Large Language Model: DistilBERT, GPT-3 Small and AlexaTM. It investigated quantitatively any improvements about memory usage and computational complexity brought by known approaches named PEPITA and MEMPEPITA with respect to backpropagation. For low number of tokens in context, and depending on the model, PEPITA increases marginally or reduces substantially arithmetic operations. On the other hand, for large number of tokens in context, PEPITA reduces computational complexity by 30% to 50%. MEMPEPITA increases PEPITA’s complexity by one third. About memory, PEPITA and backpropagation, require a comparable amount of memory to store activations, while MEMPEPITA reduces it by 50% to 94% with the benefits being more evident for architectures with a long sequence of blocks. In various real case scenarios, MEMPEPITA’s memory reduction was essential for meeting the tight memory requirements of 128 MB equipped edge consumer devices, which are commonly available as smartphone and industrial application multi processors.

show abstract

Section: On-device Learningmentioning

confidence: 99%

Forward Learning of Large Language Models by Consumer Devices

Pau,

Aymone

2024

Electronics

View full text Add to dashboard Cite

show abstract

“…Many previous works investigate how to provide new functionality using compute-capable memories based on conventional (e.g., [1,2,36,40,43,72,99,108,110]) and emerging memory technologies (e.g., [9,27,56,59,68,73,101,111,116,117,119,121,139,144]) to help solve the data movement overheads in today's systems. These works propose new functionality in at least three major categories: (1) support for logical operations (e.g., [26,73,86,110,119,121,139,144]), (2) support for complex operations, functions, and applications (e.g., [1,36,72,89,111,112,116,117,143]), and (3) programming and system support for the integration and adoption of such accelerators (e.g., [2,3,9,19,27,55,…”

Section: Computation-in-memory Acceleratorsmentioning

confidence: 99%

Demeter: A Fast and Energy-Efficient Food Profiler Using Hyperdimensional Computing in Memory

et al. 2022

View full text Add to dashboard Cite

Food profiling is an essential step in any food monitoring system needed to prevent health risks and potential frauds in the food industry. Significant improvements in sequencing technologies are pushing food profiling to become the main computational bottleneck. State-of-the-art profilers are unfortunately too costly for food profiling. Our goal is to design a food profiler that solves the main limitations of existing profilers, namely (1) working on massive data structures and (2) incurring considerable data movement, for a real-time monitoring system. To this end, we propose Demeter, the first platform-independent framework for food profiling. Demeter overcomes the first limitation through the use of hyperdimensional computing (HDC) and efficiently performs the accurate few-species classification required in food profiling. We overcome the second limitation by the use of an in-memory hardware accelerator for Demeter (named Acc-Demeter) based on memristor devices. Acc-Demeter actualizes several domain-specific optimizations and exploits the inherent characteristics of memristors to improve the overall performance and energy consumption of Acc-Demeter. We compare Demeter's accuracy with other industrial food profilers using detailed software modeling. We synthesize Acc-Demeter's required hardware using UMC's 65nm library by considering an accurate PCM model based on silicon-based prototypes. Our evaluations demonstrate that Acc-Demeter achieves a (1) throughput improvement of 192× and 724× and (2) memory reduction of 36× and 33× compared to Kraken2 and MetaCache (2 state-of-the-art profilers), respectively, on typical food-related databases. Demeter maintains an acceptable profiling accuracy (within 2% of existing tools) and incurs a very low area overhead.

show abstract

Energy Efficiency Impact of Processing in Memory: A Comprehensive Review of Workloads on the UPMEM Architecture

Falevoz,

Legriel

2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Processing-in-Memory (PIM) architectures have emerged as a promising solution for data-intensive applications, providing significant speedup by processing data directly within the memory. However, the impact of PIM on energy efficiency is not well characterized. In this paper, we provide a comprehensive review of workloads ported to the first PIM product available on the market, namely the UPMEM architecture, and quantify the impact on each workload in terms of energy efficiency. Less than the half of the reviewed papers provide insights on the impact of PIM on energy efficiency, and the evaluation methods differ from one paper to the other. To provide a comprehensive overview, we propose a methodology for estimating energy consumption and efficiency for both the PIM and baseline systems at data center level, enabling a direct comparison of the two systems. Our results show that PIM can provide significant energy savings for data intensive workloads. We also identify key factors that impact the energy efficiency of UPMEM PIM, including the workload characteristics. Overall, this paper provides valuable insights for researchers and practitioners looking to optimize energy efficiency in data-intensive applications using UPMEM PIM architecture.

show abstract

An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

Cited by 3 publications

References 98 publications

Forward Learning of Large Language Models by Consumer Devices

Forward Learning of Large Language Models by Consumer Devices

Demeter: A Fast and Energy-Efficient Food Profiler Using Hyperdimensional Computing in Memory

Energy Efficiency Impact of Processing in Memory: A Comprehensive Review of Workloads on the UPMEM Architecture

Contact Info

Product

Resources

About