Xinyu Niu scite author profile

LondonDeep neural networks have proven to be particularly e ective in visual and audio recognition tasks. Existing models tend to be computationally expensive and memory intensive, however, and so methods for hardwareoriented approximation have become a hot topic. Research has shown that custom hardware-based neural network accelerators can surpass their general-purpose processor equivalents in terms of both throughput and energy e ciency. Application-tailored accelerators, when co-designed with approximation-based network training methods, transform large, dense and computationally expensive networks into small, sparse and hardware-e cient alternatives, increasing the feasibility of network deployment. In this article, we provide a comprehensive evaluation of approximation methods for high-performance network inference along with in-depth discussion of their e ectiveness for custom hardware implementation. We also include proposals for future research based on a thorough analysis of current trends. is article represents the rst survey providing detailed comparisons of custom hardware accelerators featuring approximation for both convolutional and recurrent neural networks, through which we hope to inspire exciting new developments in the eld.

show abstract

Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA

Liu

Fan

Niu

et al. 2018

ACM Trans. Reconfigurable Technol. Syst.

View full text Add to dashboard Cite

Convolutional Neural Networks (CNNs) based algorithms have been successful in solving image recognition problems, showing very large accuracy improvement. In recent years, deconvolution layers are widely used as key components in the state-of-the-art CNNs for end-to-end training and models to support tasks such as image segmentation and super resolution. However, the deconvolution algorithms are computationally intensive which limits their applicability to real time applications. Particularly, there has been little research on the efficient implementations of deconvolution algorithms on FPGA platforms which have been widely used to accelerate CNN algorithms by practitioners and researchers due to their high performance and power efficiency. In this work, we propose and develop deconvolution architecture for efficient FPGA implementation. FPGA-based accelerators are proposed for both deconvolution and CNN algorithms. Besides, memory sharing between the computation modules is proposed for the FPGA-based CNN accelerator as well as for other optimization techniques. A non-linear optimization model based on the performance model is introduced to efficiently explore the design space in order to achieve optimal processing speed of the system and improve power efficiency. Furthermore, a hardware mapping framework is developed to automatically generate the low-latency hardware design for any given CNN model on the target device. Finally, we implement our designs on Xilinx Zynq ZC706 board and the deconvolution accelerator achieves a performance of 90.1 GOPS under 200MHz working frequency and a performance density of 0.10 GOPS/DSP using 32-bit quantization, which significantly outperforms previous designs on FPGAs. A real-time application of scene segmentation on Cityscapes Dataset is used to evaluate our CNN accelerator on Zynq ZC706 board, and the system achieves a performance of 107 GOPS and 0.12 GOPS/DSP using 16-bit quantization, and supports up to 17 frames per second for 512x512 image inputs with a power consumption of only 9.6W.

show abstract

Analysis of anthocyanins and flavonols in petals of 10 Rhododendron species from the Sygera Mountains in Southeast Tibet

Liu

Zhang

Wang

et al. 2016

Plant Physiology and Biochemistry

View full text Add to dashboard Cite

Optimizing CNN-Based Object Detection Algorithms on Embedded FPGA Platforms

Zhao

Niu

Wu³

et al. 2017

View full text Add to dashboard Cite

Abstract. Algorithms based on Convolutional Neural Network (CNN) have recently been applied to object detection applications, greatly improving their performance. However, many devices intended for these algorithms have limited computation resources and strict power consumption constraints, and are not suitable for algorithms designed for GPU workstations. This paper presents a novel method to optimise CNN-based object detection algorithms targeting embedded FPGA platforms. Given parameterised CNN hardware modules, an optimisation flow takes network architectures and resource constraints as input, and tunes hardware parameters with algorithm-specific information to explore the design space and achieve high performance. The evaluation shows that our design model accuracy is above 85% and, with optimised configuration, our design can achieve 49.6 times speed-up compared with software implementation.

show abstract

Significance of intraparotid node metastasis in predicting local control in primary parotid cancer

et al. 2018

View full text Add to dashboard Cite

Objective To analyze the metastasis rate in intraglandular lymph nodes (IGLNs) with a focus on discussing the significance of IGLN metastasis in local control (LC) of parotid gland cancer (PGC). Methods A total of 337 patients were enrolled. Information including age; sex; and pathologic variables such as tumor (T) stage, IGLN metastasis, and follow‐up findings was extracted and analyzed. Results IGLN metastasis was noted in 111 (32.9%) patients. Tumor stage, pathologic nodal stage, perineural invasion, resection status, and lymphovascular invasion were significantly related to IGLN metastasis. Local recurrence was noted in 67 (19.9%) patients. IGLN metastasis was an independent predictor of LC. The 10‐year LC rate was 94% for patients without IGLN metastasis, 56% for patients with metastasis in no more than two IGLNs, and 22% for patients with metastasis in more than two IGLNs. This difference was significant (P < 0.001). Conclusion The IGLN metastasis rate is relatively high in PGC patients and is significantly associated with disease grade and T stage. IGLN metastasis is associated with poorer local LC, and patients with more than two metastatic nodes have the worst prognosis. Level of Evidence 4 Laryngoscope, 129:2309–2312, 2019

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xinyu Niu

Deep Neural Network Approximation for Custom Hardware

Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA

Analysis of anthocyanins and flavonols in petals of 10 Rhododendron species from the Sygera Mountains in Southeast Tibet

Optimizing CNN-Based Object Detection Algorithms on Embedded FPGA Platforms

Significance of intraparotid node metastasis in predicting local control in primary parotid cancer

Contact Info

Product

Resources

About