2016 IEEE International Symposium on High Performance Computer Architecture (HPCA) 2016
DOI: 10.1109/hpca.2016.7446050
|View full text |Cite
|
Sign up to set email alerts
|

TABLA: A unified template-based framework for accelerating statistical machine learning

Abstract: A growing number of commercial and enterprise systems increasingly rely on compute-intensive machine learning algorithms. While the demand for these compute-intensive applications is growing, the performance benefits from general-purpose platforms are diminishing. Field Programmable Gate Arrays (FP-GAs) provide a promising path forward to accommodate the needs of machine learning algorithms and represent an intermediate point between the efficiency of ASICs and the programmability of general-purpose processors… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
60
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 138 publications
(61 citation statements)
references
References 47 publications
1
60
0
Order By: Relevance
“…The only amendment we made in the device architecture was to increase the capacity of I/O pads from 2 to 4 as our benchmarks are heavily I/O bound. Our benchmarks include Tabla [13], DnnWeaver [14], DianNao [9], Stripes [45], and Proteus [46] which are general neural network acceleration frameworks capable of optimizing various objective functions through gradient descent by supporting huge Figure 10 compares the achieved power gain of different voltage scaling approaches implemented the Tabla acceleration framework under a varying workload. We considered a synthetic workload with 40% average load (of the maximum) from [47] with λ = 1000, H = 0.76 and IDC = 500 where λ, 0.5 < H ≤ 1 and IDC denote the average arrival rate of the whole process, Hurst exponent, and the index of dispersion, respectively.…”
Section: A General Setupmentioning
confidence: 99%
See 1 more Smart Citation
“…The only amendment we made in the device architecture was to increase the capacity of I/O pads from 2 to 4 as our benchmarks are heavily I/O bound. Our benchmarks include Tabla [13], DnnWeaver [14], DianNao [9], Stripes [45], and Proteus [46] which are general neural network acceleration frameworks capable of optimizing various objective functions through gradient descent by supporting huge Figure 10 compares the achieved power gain of different voltage scaling approaches implemented the Tabla acceleration framework under a varying workload. We considered a synthetic workload with 40% average load (of the maximum) from [47] with λ = 1000, H = 0.76 and IDC = 500 where λ, 0.5 < H ≤ 1 and IDC denote the average arrival rate of the whole process, Hurst exponent, and the index of dispersion, respectively.…”
Section: A General Setupmentioning
confidence: 99%
“…Unfortunately, they are limited to a specific subset of applications while the applications and/or implementation of data centers evolve with a high pace. Thanks to their relatively lower power consumption, finegrained parallelism, and programmability, in the last few years, Field-Programmable Gate Arrays (FPGAs) have shown great performance in various applications [10], [11], [12], [13], [14]. Therefore, they have been integrated in data centers to accelerate the data center applications.…”
Section: Introductionmentioning
confidence: 99%
“…Once the analyst imports the dana package, she can express the required variables. The code snippet below declares a multidimensional ML model of size [5][2] using dana.model construct.…”
Section: Language Constructsmentioning
confidence: 99%
“…Large-scale neural networks are both memory-intensive and computation-intensive, thereby posing stringent requirements on the computing platforms when deploying those large-scale neural network models on memory-constrained and energyconstrained embedded devices. In order to overcome these limitations, the hardware accelerations of deep neural networks have been extensively investigated in both industry and academia [1], [2], [3], [4], [5], [6], [7], [8]. These hardware accelerations are based on FPGA and ASIC devices and can achieve a significant improvement on energy efficiency, along with small form factor, compared with traditional CPU or GPU based computing of deep neural networks.…”
Section: Introductionmentioning
confidence: 99%