Analysis, Modeling and Optimization of Equal Segment Based Approximate Adders

Dutt, Sunil; Dash, Satyabrata; Nandi, Sukumar; Trivedi, Gaurav

doi:10.1109/tc.2018.2871096

Cited by 18 publications

(6 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since the bit length of sub adder determines the latency of ESA, O(log(k)), the delay of adder reduces with a decrease of k. However, the accuracy of ESA grows with an increase of k, which makes it essential to tradeoff the value of k in the light of the application's errorresilience when designing ESA. A straightforward way to improve ESA's accuracy is to increase the length of each sub adder and ensure that the number of sub adders remains the same through overlapping among the sub adders, which will not change the adder's delay [82], [83]. Another strategy to improve the adder's accuracy is to get more information for carry prediction by transferring the carry from adjacent sub adder to sum generator while retaining the length of sub adder unchanged.…”

Section: B Hardware Levelmentioning

confidence: 99%

A Systematic Study of Tiny YOLO3 Inference: Toward Compact Brainware Processor With Less Memory and Logic Gate

Endoh

2020

IEEE Access

View full text Add to dashboard Cite

The emerging of deep neural networks, especially the convolutional neural network (CNN), substantially promotes the fast development of brainware processors in object detection. However, the vast network architecture brings severe challenges to the design of brainware processor, which requires a large number of logic gates and memories. Therefore, a compact brainware processor with less memory and logic gate has a high demand in object detection. Typically, the object detection involves single-shot and multishot detectors in accordance with different detection principle. In the early stage, the multi-shot detector has a leading role in solving object detection issues, such as region-based convolutional neural networks (R-CNNs), faster R-CNNs etc. However, the multi-shot detector suffers from a low detection rate comparing with the single-shot detector. The you only look once (YOLO) algorithm, as the state-of-the-art real-time object detection algorithm, receives extensive attention from the academics and industry. Particularly, the lightweight YOLO algorithm, tiny YOLO3, has excellent potential for circuit design of compact brainware processor. Nonetheless, systematic studies of tiny YOLO3 are still missing up to the present. This paper offers a thorough review of the tiny YOLO3 algorithm, which can fill the gap in the field of object detection. Furthermore, the open solutions of compressing the tiny YOLO3 algorithm are proposed from the aspects of algorithm, hardware and emerging technology. The comprehensive study presented in this paper can not only enhance understanding of the tiny YOLO3 algorithm for researchers or engineers but also make a significant contribution to accelerating the development of compact brainware processor. INDEX TERMSTiny YOLO3, brainware processor, deep neural network, CNN, hardware acceleration VOLUME 4, 2016 II. RELATED WORK A. YOLO DEVELOPMENT J. Redmon firstly proposed the YOLO algorithm in May 2016 [22], and it has been evolved to four generations within four years. The base YOLO, motivated by fast R-CNNs, introduces the region-based concept to the neural network.Peculiarly, the input image is divided into different grid cells where two bounding boxes are predicted. In each bounding box, the center coordinate of object, confidence scores and the class probabilities are predicted. The confidence score is responsible for checking whether or not the object exists in the bounding box. The base YOLO has a good advantage in the perspective of speed and acceptable accuracy, but the following drawbacks retard the development of base YOLO:• The base YOLO is hard to handle the situation that the distance of two objects is very close. The detector may

show abstract

Section: B Hardware Levelmentioning

confidence: 99%

A Systematic Study of Tiny YOLO3 Inference: Toward Compact Brainware Processor With Less Memory and Logic Gate

Endoh

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…ETA-II, ETA-IIM [18], and carry skip approximate adders [10,19] are based on segmentation that truncates carry propagation. Further, the probabilistic error analysis of these segment-based adders is presented in [20,21]. To increase the applicability of approximate designs, various accuracy configurable architectures are also presented, which are reviewed in the next subsection.…”

Section: Approximate Adder Architecturesmentioning

confidence: 99%

A power and area efficient approximate carry skip adder for error resilient applications

Patel¹,

Garg²,

Kumar³

2020

Turk J Elec Eng & Comp Sci

View full text Add to dashboard Cite

The compute-intensive multimedia applications on portable devices require power and area efficient arithmetic units. The adder is a prime building block of these arithmetic units and limits the overall performance. Therefore, this paper analyzes the logic operations of the state-of-the-art adders and presents a novel low complexity adder segment with new carry prediction logic by removing the redundant logic and sharing the common operations. Further, a new power and area efficient approximate carry skip (PAEA-CSK) adder is proposed using the novel adder segment. The effectiveness of the proposed PAEA-CSK adder is evaluated and compared over the existing adders by implementing them in VHDL and synthesizing using the Synopsys Design Compiler with the 65nm TSMC CMOS Library. The synthesis result shows that the proposed PAEA-CSK adder requires 27.28% and 18.03% less area and power, respectively, over the existing carry skip-based approximate adder with the same accuracy. Further, the Sobel edge detector (SED) embedded with the proposed adder improves PSNR by a minimum of 16.94 dB over the SED embedded with a nonzeroing bit-truncation adder.

show abstract

“…The algorithm improvement and hardware approximation can accomplish DNNs compression. The essence of DNNs compression is to take advantage of approximate weights or feature maps, approximate arithmetic [26] or approximate circuit [27][28][29] to realize convolution operations. The weight or feature map sparsification intends to eliminate the redundant weights or feature maps that contribute little to the accuracy of DNNs.…”

Section: Introductionmentioning

confidence: 99%

Neuromorphic processor-oriented hybrid Q-format multiplication with adaptive quantization for tiny YOLO3

Endoh

2023

Neural Comput & Applic

View full text Add to dashboard Cite

Deep neural networks (DNNs) have delivered unprecedented achievements in the modern Internet of Everything society, encompassing autonomous driving, expert diagnosis, unmanned supermarkets, etc. It continues to be challenging for researchers and engineers to develop a high-performance neuromorphic processor for deployment in edge devices or embedded hardware. DNNs’ superpower derives from their enormous and complex network architecture, which is computation-intensive, time-consuming, and energy-heavy. Due to the limited perceptual capacity of humans, accurate processing results from DNNs require a substantial amount of computing time, making them redundant in some applications. Utilizing adaptive quantization technology to compress the DNN model with sufficient accuracy is crucial for facilitating the deployment of neuromorphic processors in emerging edge applications. This study proposes a method to boost the development of neuromorphic processors by conducting fixed-point multiplication in a hybrid Q-format using an adaptive quantization technique on the convolution of tiny YOLO3. In particular, this work integrates the sign-bit check and bit roundoff techniques into the arithmetic of fixed-point multiplications to address overflow and roundoff issues within the convolution’s adding and multiplying operations. In addition, a hybrid Q-format multiplication module is developed to assess the proposed method from a hardware perspective. The experimental results prove that the hybrid multiplication with adaptive quantization on the tiny YOLO3’s weights and feature maps possesses a lower error rate than alternative fixed-point representation formats while sustaining the same object detection accuracy. Moreover, the fixed-point numbers represented by Q(6.9) have a suboptimal error rate, which can be utilized as an alternative representation form for the tiny YOLO3 algorithm-based neuromorphic processor design. In addition, the 8-bit hybrid Q-format multiplication module exhibits low power consumption and low latency in contrast to benchmark multipliers.

show abstract

Analysis, Modeling and Optimization of Equal Segment Based Approximate Adders

Cited by 18 publications

References 32 publications

A Systematic Study of Tiny YOLO3 Inference: Toward Compact Brainware Processor With Less Memory and Logic Gate

A Systematic Study of Tiny YOLO3 Inference: Toward Compact Brainware Processor With Less Memory and Logic Gate

A power and area efficient approximate carry skip adder for error resilient applications

Neuromorphic processor-oriented hybrid Q-format multiplication with adaptive quantization for tiny YOLO3

Contact Info

Product

Resources

About