Power-Optimal Mapping of CNN Applications to Cloud-Based Multi-FPGA Platforms

Shan, Junnan; Lazarescu, Mihai T.; Cortadella, Jordi; Lavagno, Luciano; Casu, Mario R.

doi:10.1109/tcsii.2020.2998284

Cited by 10 publications

(12 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(2) CNN (Convolution Neural Network) is a neural network structure algorithm based on a multilayer perceptron. CNN can effectively learn semantic features and has been successfully applied in various fields [ 19 ]. It is generally composed of three parts: an input layer, output layer, and hidden layer.…”

Section: Methodsmentioning

confidence: 99%

The use of machine translation algorithm based on residual and LSTM neural network in translation teaching

Ren¹

2020

PLoS ONE

View full text Add to dashboard Cite

With the rapid development of big data and deep learning, breakthroughs have been made in phonetic and textual research, the two fundamental attributes of language. Language is an essential medium of information exchange in teaching activity. The aim is to promote the transformation of the training mode and content of translation major and the application of the translation service industry in various fields. Based on previous research, the SCN-LSTM (Skip Convolutional Network and Long Short Term Memory) translation model of deep learning neural network is constructed by learning and training the real dataset and the public PTB (Penn Treebank Dataset). The feasibility of the model’s performance, translation quality, and adaptability in practical teaching is analyzed to provide a theoretical basis for the research and application of the SCN-LSTM translation model in English teaching. The results show that the capability of the neural network for translation teaching is nearly one times higher than that of the traditional N-tuple translation model, and the fusion model performs much better than the single model, translation quality, and teaching effect. To be specific, the accuracy of the SCN-LSTM translation model based on deep learning neural network is 95.21%, the degree of translation confusion is reduced by 39.21% compared with that of the LSTM (Long Short Term Memory) model, and the adaptability is 0.4 times that of the N-tuple model. With the highest level of satisfaction in practical teaching evaluation, the SCN-LSTM translation model has achieved a favorable effect on the translation teaching of the English major. In summary, the performance and quality of the translation model are improved significantly by learning the language characteristics in translations by teachers and students, providing ideas for applying machine translation in professional translation teaching.

show abstract

Section: Methodsmentioning

confidence: 99%

The use of machine translation algorithm based on residual and LSTM neural network in translation teaching

Ren¹

2020

PLoS ONE

View full text Add to dashboard Cite

show abstract

“…The detailed power model is discussed in [3]. 1) Static power: includes the DDR static power, P DDRs , and the FPGA static power, P fs .…”

Section: Power Modelingmentioning

confidence: 99%

“…The dynamic power of FPGA f , P fd,f , depends on the number of CUs of each kernel allocated to it, n k,f , and scales with the clock frequency. The detailed equation for the calculation of the DDR dynamic power is discussed in [3].…”

Section: Power Modelingmentioning

confidence: 99%

“…As in our previous work [3], we split the applications into several pipelined kernels (e.g., one per macro-layer in a Deep Neural Network) and profile each of them with Xilinx SDAccel [4] for resource usage (LUTs, FFs, DSPs, BRAMs), execution time, and DDR memory bandwidth. Then we input the profiled kernels and available FPGA resources into our power and performance model to obtain a power-optimal implementation to program the FPGAs.…”

Section: Introductionmentioning

confidence: 99%

“…Efficient resource allocation for high-performance data center applications is a well-studied topic for GPUs, CPUs, and FPGAs. Previously, we provided detailed resource allocation models to maximize the application throughput, optimized with both an MINLP solver and a heuristic method, but without considering power consumption [5], [6] and we demonstrated in [3] a multi-objective optimization that minimizes energy consumption while meeting performance requirements using an MINLP solver, which is very slow. Here, we propose two heuristic solvers, which can be several orders of magnitude faster.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Fast Energy-Optimal Multikernel DNN-Like Application Allocation on Multi-FPGA Platforms

Shan

Lazarescu

Cortadella

et al. 2022

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

Self Cite

View full text Add to dashboard Cite

Platforms with multiple Field Programmable Gate Arrays (FPGAs), such as Amazon Web Services (AWS) F1 instances, can efficiently accelerate multi-kernel pipelined applications, e.g., Convolutional Neural Networks for machine vision tasks or transformer networks for Natural Language Processing tasks. To reduce energy consumption when the FPGAs are underutilized, we propose a model to (1) find off-line the minimumpower solution for given throughput constraints, and (2) dynamically reprogram the FPGA at runtime (which is complementary to dynamic voltage and frequency scaling) to match best the workloads when they change. The off-line optimization model can be solved using a Mixed-Integer Non-Linear Programming (MINLP) solver, but it can be very slow. Hence, we provide two heuristic optimization methods that improve result quality within a bounded time. We use several very large designs to demonstrate that both heuristics obtain comparable results to MINLP, when it can find the best solution, and they obtain much better results than MINLP, when it cannot find the optimum within a bounded amount of time. The heuristic methods can also be thousands of times faster than the MINLP solver.

show abstract