The proposal of the ubiquitous power Internet of Things (UPIoT) has increased the demand for communication coverage and data collection of smart grid; the quantity and quality of communication networks are facing greater challenges. This brief applies (73, 37, 13) quadratic residue (QR) codes to power line carrier technology to improve the quality of local data communication in UPIoT. In order to improve the decoding performance of the QR codes, an induction method for the error pattern is proposed, which can divide the originally coupled error pattern into six parts and reuse the same module for decoding. This method greatly reduces the resource requirements, so that (73, 37, 13) QR code can be implemented on FPGA hardware. Notably, the hardware architecture is a modular framework, which can fit into an FPGA with different sizes. As an example (73, 37, 13), QR code is implemented on Intel Arria10 FPGA; the experimental result shows that the maximum decoding frequency of this architecture is 21.7 M Hz, which achieves 4121x speedup compared to CPU. Moreover, the proposed architecture benefits from high flexibility, such as modular design and decoding framework in the form of the pipeline which can be seen as an alternative scheme for decoding long-length QR codes.
Depthwise separable convolution (DSC) significantly reduces parameter and floating operations with an acceptable loss of accuracy and has been widely used in various lightweight convolutional neural network (CNN) models. In practical applications, however, DSC accelerators based on graphics processing units (GPUs) cannot fully exploit the performance of DSC and are unsuitable for mobile application scenarios. Moreover, low resource utilization due to idle engines is a common problem in DSC accelerator design. In this paper, a high-performance DSC hardware accelerator based on field-programmable gate arrays (FPGAs) is proposed. A highly reusable and scalable multiplication and accumulation engine is proposed to improve the utilization of computational resources. An efficient convolution algorithm is proposed for depthwise convolution (DWC) and pointwise convolution (PWC), respectively, to reduce the on-chip memory occupancy. Meanwhile, the proposed convolution algorithms achieve partial fusion between PWC and DWC, and improve the off-chip memory access efficiency. To maximise bandwidth utilization and reduce latency when reading feature maps, an address mapping method for off-chip accesses is proposed. The performance of the proposed accelerator is demonstrated by implementing MobileNetV2 on an Intel Arria 10 GX660 FPGA by using Verilog HDL. The experimental results show that the proposed DSC accelerator achieves a performance of 205.1 FPS, 128.8 GFLOPS, and 0.24 GOPS/DSP for input images of size 224×224×3.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.