Efficient FPGA Implementation of Multilayer Perceptron for Real-Time Human Activity Classification

Gaikwad, Nikhil B.; Tiwari, Varun; Keskar, Avinash G.; Shivaprakash, N. C.

doi:10.1109/access.2019.2900084

Cited by 80 publications

(42 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Finally, for very small networks, such as the ones used in applications B and C, the runtime is far below the millisecond range. If the application scenario requires only very few classifications per cluster activation, then the IBEX core is the most energy-efficient one, with a consumption of 2.9 µJ and 0.15 µJ, respectively for applications B and C. Comparing to the work in [46] for application C, the IBEX core is 13.5× faster in computation time and 434× more energy efficient than a parallel FPGA implementation. However, if continuous classification is required, which is the case for the vast majority of the IoT applications, then the parallel execution, once again, outperforms in terms of speed and energy efficiency.…”

Section: Experimental Evaluation and Resultsmentioning

confidence: 99%

“…MLPs have been successfully used in a wide range of application scenarios, such as disease detection [45], activity recognition [46], and brain-machine interface [47]. Many studies identified MLPs to be the best or one of the best algorithms to solve tasks in the IoT domain using wearable devices [48]- [51].…”

Section: Application Showcasesmentioning

confidence: 99%

“…The authors in [46] proposed an FPGA implementation of MLPs with parallel computation to classify human activity in real-time. The dataset is acquired by a 3-axial accelerometer wore on the waist and classified into five activity classes.…”

Section: Human Activity Classificationmentioning

confidence: 99%

See 2 more Smart Citations

FANN-on-MCU: An Open-Source Toolkit for Energy-Efficient Neural Network Inference at the Edge of the Internet of Things

Wang

Magno

Cavigelli

et al. 2020

IEEE Internet Things J.

130

View full text Add to dashboard Cite

The growing number of low-power smart devices in the Internet of Things is coupled with the concept of "Edge Computing", that is moving some of the intelligence, especially machine learning, towards the edge of the network. Enabling machine learning algorithms to run on resource-constrained hardware, typically on low-power smart devices, is challenging in terms of hardware (optimized and energy-efficient integrated circuits), algorithmic and firmware implementations. This paper presents FANN-on-MCU, an open-source toolkit built upon the Fast Artificial Neural Network (FANN) library to run lightweight and energy-efficient neural networks on microcontrollers based on both the ARM Cortex-M series and the novel RISC-Vbased Parallel Ultra-Low-Power (PULP) platform. The toolkit takes multi-layer perceptrons trained with FANN and generates code targeted to low-power microcontrollers. This paper also presents detailed analyses of energy efficiency across the different cores, and the optimizations to handle different network sizes. Moreover, it provides a detailed analysis of parallel speedups and degradations due to parallelization overhead and memory transfers. Further evaluations include experimental results for three different applications using a self-sustainable wearable multi-sensor bracelet. Experimental results show a measured latency in the order of only a few microseconds and power consumption of a few milliwatts while keeping the memory requirements below the limitations of the targeted microcontrollers. In particular, the parallel implementation on the octa-core RISC-V platform reaches a speedup of 22x and a 69% reduction in energy consumption with respect to a single-core implementation on Cortex-M4 for continuous real-time classification.

show abstract

Section: Experimental Evaluation and Resultsmentioning

confidence: 99%

Section: Application Showcasesmentioning

confidence: 99%

See 1 more Smart Citation

FANN-on-MCU: An Open-Source Toolkit for Energy-Efficient Neural Network Inference at the Edge of the Internet of Things

Wang

Magno

Cavigelli

et al. 2020

IEEE Internet Things J.

130

View full text Add to dashboard Cite

show abstract

“…Gaikwad et al [29] proposed a hardware implemented FPGA for military equipment that uses an MLP algorithm to perform classification tasks. Parallel MLP computation was implemented to reach enhanced hardware design.…”

Section: Hardware Ann Fpga Implementationmentioning

confidence: 99%

Embedded Artificial Neural Network FPGA Controlled Cart

Ahmad¹,

Alhady²,

Oon³

et al. 2019

Adv. sci. technol. eng. syst. j.

View full text Add to dashboard Cite

An artificial neural network (ANN) computing system can be significantly influenced by its implementation type. The software implemented ANN can produce high accuracy output with slow computation time performance compared to hardware implemented ANN which runs at a faster computation time but with low accuracy. Normally, software implementation reduces the proficiency and efficiency of the model. Robot performance plays an important role as it needs fast response to process information that is applied with ANN. As a consequence, the proposed research focuses on comparison between hardware and software implementation multilayer perceptron (MLP) for cart follower in Field Programmable Logic Array (FPGA). Both of the software and hardware models produced the same precision where the output distance at angles-10°, 0° and 10° shows same percentage error. Besides that, both of the models have similar root mean square error (RMSE) which are 0.469, 0.479 and 0.267 at-10°, 0° and 10° respectively. The processing time of MLP model implemented in hardware and software are at 1.91μs and 78.06μs respectively. Thus, it can be concluded that hardware implementation is better than software implementation.

show abstract

“…The development of wearable devices leads to implementation of ML algorithms directly on board [18,19], allowing for the reduction of the amount of data to be transmitted, and with consistent advantages in terms of power consumption and system usability [14]. To address the issues related to the need for platforms with good computing capacity, instead of general-purpose processors, dedicated hardware architectures such as field programmable gate arrays (FPGAs) can be selected for the implementation of the algorithms [20][21][22][23]. This allows for the control of the resources needed for the task and to optimize the system for performance or physical size, depending on the use case.…”

Section: Introductionmentioning

confidence: 99%

A Model-Based Design Floating-Point Accumulator. Case of Study: FPGA Implementation of a Support Vector Machine Kernel Function

Bassoli

Bianchi

Munari

2020

Sensors

View full text Add to dashboard Cite

Recent research in wearable sensors have led to the development of an advanced platform capable of embedding complex algorithms such as machine learning algorithms, which are known to usually be resource-demanding. To address the need for high computational power, one solution is to design custom hardware platforms dedicated to the specific application by exploiting, for example, Field Programmable Gate Array (FPGA). Recently, model-based techniques and automatic code generation have been introduced in FPGA design. In this paper, a new model-based floating-point accumulation circuit is presented. The architecture is based on the state-of-the-art delayed buffering algorithm. This circuit was conceived to be exploited in order to compute the kernel function of a support vector machine. The implementation of the proposed model was carried out in Simulink, and simulation results showed that it had better performance in terms of speed and occupied area when compared to other solutions. To better evaluate its figure, a practical case of a polynomial kernel function was considered. Simulink and VHDL post-implementation timing simulations and measurements on FPGA confirmed the good results of the stand-alone accumulator.

show abstract

Efficient FPGA Implementation of Multilayer Perceptron for Real-Time Human Activity Classification

Cited by 80 publications

References 36 publications

FANN-on-MCU: An Open-Source Toolkit for Energy-Efficient Neural Network Inference at the Edge of the Internet of Things

FANN-on-MCU: An Open-Source Toolkit for Energy-Efficient Neural Network Inference at the Edge of the Internet of Things

Embedded Artificial Neural Network FPGA Controlled Cart

A Model-Based Design Floating-Point Accumulator. Case of Study: FPGA Implementation of a Support Vector Machine Kernel Function

Contact Info

Product

Resources

About