2020
DOI: 10.48550/arxiv.2003.06464
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

LCP: A Low-Communication Parallelization Method for Fast Neural Network Inference in Image Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
13
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(13 citation statements)
references
References 66 publications
0
13
0
Order By: Relevance
“…Data parallelism is a straightforward method that allows independent data to be processed concurrently by duplicating devices that perform the same task or, in other words, that utilize the same DNN model. This could facilitate the reusability of data and increase the inference throughput [ 75 ]—the number of inferences per unit of time—nearly two-fold in the best-case scenario. Nonetheless, as evidenced by its single occurrence in the corpus analyzed [ 72 ]—shown in the “Parallelism” column in Table 3 —implementing and applying such a parallelism scheme may not be effective for devices with limited resources or scenarios involving IoT devices because it does not change the computation and memory footprint per node [ 75 , 76 ] and it lacks the flexibility required to adjust the amount of computation on each node [ 76 ].…”
Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning
confidence: 99%
See 4 more Smart Citations
“…Data parallelism is a straightforward method that allows independent data to be processed concurrently by duplicating devices that perform the same task or, in other words, that utilize the same DNN model. This could facilitate the reusability of data and increase the inference throughput [ 75 ]—the number of inferences per unit of time—nearly two-fold in the best-case scenario. Nonetheless, as evidenced by its single occurrence in the corpus analyzed [ 72 ]—shown in the “Parallelism” column in Table 3 —implementing and applying such a parallelism scheme may not be effective for devices with limited resources or scenarios involving IoT devices because it does not change the computation and memory footprint per node [ 75 , 76 ] and it lacks the flexibility required to adjust the amount of computation on each node [ 76 ].…”
Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning
confidence: 99%
“…Model parallelism [ 70 , 71 , 72 , 73 , 74 , 75 , 76 , 78 , 80 , 81 , 83 , 84 , 86 , 87 , 89 ] attempts to address both issues by employing partitioning strategies with a finer granularity to produce less expensive DNN subtasks, i.e., partitions with fewer parameters and fewer computation requirements than a layer, and foster more adaptable co-inference schemes. The computations required for a single input are distributed across multiple computing entities, reducing the time needed to process the shared input [ 76 ] but delivering a performance that, as opposed to what has been indicated for pipeline parallelism, is highly dependent on the distribution of such computations across devices.…”
Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning
confidence: 99%
See 3 more Smart Citations