Proceedings of the 42nd Annual International Symposium on Computer Architecture 2015
DOI: 10.1145/2749469.2749472
|View full text |Cite
|
Sign up to set email alerts
|

DjiNN and Tonic

Abstract: As applications such as Apple Siri, Google Now, Microsoft Cortana, and Amazon Echo continue to gain traction, webservice companies are adopting large deep neural networks (DNN) for machine learning challenges such as image processing, speech recognition, natural language processing, among others. A number of open questions arise as to the design of a server platform specialized for DNN and how modern warehouse scale computers (WSCs) should be outfitted to provide DNN as a service for these applications.In this… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 124 publications
(3 citation statements)
references
References 32 publications
0
3
0
Order By: Relevance
“…There is also a large body of work accelerating machine learning-based applications using various accelerator platforms [1, 7ś9, 12ś14, 19, 23ś25, 30, 31, 36, 38, 41, 42, 58, 63, 80]. Speciically, GPUs have been shown to ofer orders of magnitude performance improvement over multicore CPUs [24,25,27,58]. This is because many machine learning algorithms spend a large fraction of their execution time performing matrix multiplication, which can be parallelized on the large number of threads ofered by GPUs.…”
Section: Related Workmentioning
confidence: 99%
“…There is also a large body of work accelerating machine learning-based applications using various accelerator platforms [1, 7ś9, 12ś14, 19, 23ś25, 30, 31, 36, 38, 41, 42, 58, 63, 80]. Speciically, GPUs have been shown to ofer orders of magnitude performance improvement over multicore CPUs [24,25,27,58]. This is because many machine learning algorithms spend a large fraction of their execution time performing matrix multiplication, which can be parallelized on the large number of threads ofered by GPUs.…”
Section: Related Workmentioning
confidence: 99%
“…To improve a WSC's performance and cost-effectiveness, a large volume of studies discuss resource-and performanceaware job scheduling [18-21, 36-38, 44, 49, 52-54, 56], adaptive power management [29,31,39,40,42], novel architectures [9,10,27], and systematic performance investigation methods [2,17,22,25,30,33,47,48,50]. Unfortunately, to the best of our knowledge, there is no standard methodology to evaluate the holistic performance of a WSC running thousands of distinct jobs.…”
Section: Need For a Systematic Performance Evaluation Methodologymentioning
confidence: 99%
“…In order to achieve high‐efficiency inference in these immersive services on mobile device, we can offload some computation tasks to effectively leverage computing resources in edge servers and cloud servers. Coincidentally, we find that the deep neural network (DNN) is the most common of ML techniques 12 and the DNN model can be split into different portions on layer‐level partition 13 . Therein, partial offloading can sometimes outperform binary offloading because an internal layer inside the DNN model usually yields a smaller intermediate output than input layer.…”
Section: Introductionmentioning
confidence: 99%