X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks

Kim, Jonghong; Hwang, Kyuyeon; Sung, Wonyong

doi:10.1109/icassp.2014.6855060

Cited by 37 publications

(27 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…After the fine tuning of the parameters, the training with the boundary model was applied. Although the drop out [17] will improve the absolute performance of DNNs, we believe that it does not critically affect the relative results among the methods, as discussed in [7]. Table 3 specifies the WAs with different number of bits for discretization.…”

Section: Methodsmentioning

confidence: 99%

“…A fixed-point DNN, whose weights, bias parameters and middle layer inputs are linearly quantized to n bits, enables fast processing on a Very Large Scale Integration [7] or a CPU with Supplemental Streaming SIMD Extensions 3 (SSSE3) instruction set [8]. The parameters of fixed-point DNNs are trained by iterating the quantization of weights to n bits and usual back propagation [7,9].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Acoustic model training based on node-wise weight boundary model increasing speed of discrete neural networks

Takeda

Komatani

Nakadai

2015

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

View full text Add to dashboard Cite

Our purpose is to realize discrete neural networks (NNs), whose some parameters are discretized, as a low-resource and fast NNs for acoustic models. Two essential problems should be tackled for its realization; 1) the reduction of discretization errors and 2) the implementation method for fast processing. We propose a new parameter training algorithm for 1) and an implementation using look-up table (LUT) on general-purpose CPUs for 2), respectively. The former can set proper boundaries of discretization at each node of NNs, resulting in the reduction of discretization error. The latter can reduce the memory usage of NNs within the cache size of CPU by encoding parameters of NNs. Experiments with 2-bit discrete NNs showed that our algorithm maintained almost the same word accuracy as 8-bit discrete NNs and achieved a 40% increase in speed of the NN's forward calculation.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Acoustic model training based on node-wise weight boundary model increasing speed of discrete neural networks

Takeda

Komatani

Nakadai

2015

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

View full text Add to dashboard Cite

show abstract

“…Kim et al use reduced precision for a custom neural network circuit design [22]. However, this implementation lacks the configurability to run di↵erent networks at di↵er-ent precisions.…”

Section: Related Workmentioning

confidence: 98%

Proteus

Judd

Albericio

Hetherington

et al. 2016

Proceedings of the 2016 International Conference on Supercomputing

View full text Add to dashboard Cite

This work exploits the tolerance of Deep Neural Networks (DNNs) to reduced precision numerical representations and specifically, their recently demonstrated ability to tolerate representations of di↵erent precision per layer while maintaining accuracy. This flexibility enables improvements over conventional DNN implementations that use a single, uniform representation. This work proposes Proteus, which reduces the data tra c and storage footprint needed by DNNs, resulting in reduced energy and improved area efficiency for DNN implementations. Proteus uses a di↵er-ent representation per layer for both the data (neurons) and the weights (synapses) processed by DNNs. Proteus is a layered extension over existing DNN implementations that converts between the numerical representation used by the DNN execution engines and the shorter, layer-specific fixedpoint representation used when reading and writing data values to memory be it on-chip bu↵ers or o↵-chip memory.Proteus uses a novel memory layout for DNN data, enabling a simple, low-cost and low-energy conversion unit.We evaluate Proteus as an extension to a state-of-the-art accelerator [7] which uses a uniform 16-bit fixed-point representation. On five popular DNNs Proteus reduces data tra c among layers by 43% on average while maintaining accuracy within 1% even when compared to a single precision floating-point implementation. As a result, Proteus improves energy by 15% with no performance loss. Proteus also reduces the data footprint by at least 38% and hence the amount of on-chip bu↵ering needed resulting in an implementation that requires 20% less area overall. This area savings can be used to improve cost by building smaller chips, to process larger DNNs for the same on-chip area, or to incorporate an additional three execution engines increasing peak performance bandwidth by 18%.

show abstract

“…The performances of the networks are shown in TABLE III. (N, K) is (16,4) for the structured sparse networks. Floatingpoint networks show the lowest error rate when normalizers are not applied.…”

Section: B Cifar-10mentioning

confidence: 99%

Structured sparse ternary weight coding of deep neural networks for efficient hardware implementations

Boo

Sung

2017

2017 IEEE International Workshop on Signal Processing Systems (SiPS)

Self Cite

View full text Add to dashboard Cite

Deep neural networks (DNNs) usually demand a large amount of operations for real-time inference. Especially, fully-connected layers contain a large number of weights, thus they usually need many off-chip memory accesses for inference. We propose a weight compression method for deep neural networks, which allows values of +1 or -1 only at predetermined positions of the weights so that decoding using a table can be conducted easily. For example, the structured sparse (8,2) coding allows at most two non-zero values among eight weights. This method not only enables multiplication-free DNN implementations but also compresses the weight storage by up to x32 compared to floating-point networks. Weight distribution normalization and gradual pruning techniques are applied to mitigate the performance degradation. The experiments are conducted with fully-connected deep neural networks and convolutional neural networks.Index Terms-Deep neural networks, weight storage compression, structured sparsity, fixed-point quantization, network pruning. The hardware for inference contains a small look-up table for decompressing the code, but the procedure is very simple and deterministic. The data-path needs a reduced number of arithmetic units because most of the weights are pruned to zero. The indexing addresses can be easily interpreted to corresponding weights. Also, this method has a good scalability because the look-up table size is independent of the network complexity. However, training the structured sparsity network is more difficult than optimizing the conventional ternary valued networks. We use batch normalization and weight normalization techniques to mitigate the performance degradation. Also gradual pruning technique is applied for a large-sized network to improve the performance. The proposed scheme was evaluated on FCDNN, VGG-9, and AlexNet and obtained the compression rate between x23 and x32. The rest of this paper is organized as follows. Section II

show abstract

X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks

Cited by 37 publications

References 15 publications

Acoustic model training based on node-wise weight boundary model increasing speed of discrete neural networks

Acoustic model training based on node-wise weight boundary model increasing speed of discrete neural networks

Proteus

Structured sparse ternary weight coding of deep neural networks for efficient hardware implementations

Contact Info

Product

Resources

About