2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT) 2019
DOI: 10.1109/pact.2019.00021
|View full text |Cite
|
Sign up to set email alerts
|

MOSAIC: Heterogeneity-, Communication-, and Constraint-Aware Model Slicing and Execution for Accurate and Efficient Inference

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(23 citation statements)
references
References 37 publications
0
23
0
Order By: Relevance
“…To investigate the deep learning training workload, we profiled the latency of each step in the network model training. In FP, BP, and UP, we confirmed that the latency of each step is different depending on the characteristics of the layer, the size of the filter, and the size of the input data [ 22 , 24 , 25 ]. We profiled the execution time of the CPU and GPU while training the ResNet-18 model, as depicted in Figure 6 .…”
Section: Methodsmentioning
confidence: 79%
See 3 more Smart Citations
“…To investigate the deep learning training workload, we profiled the latency of each step in the network model training. In FP, BP, and UP, we confirmed that the latency of each step is different depending on the characteristics of the layer, the size of the filter, and the size of the input data [ 22 , 24 , 25 ]. We profiled the execution time of the CPU and GPU while training the ResNet-18 model, as depicted in Figure 6 .…”
Section: Methodsmentioning
confidence: 79%
“…Considering the effective use of the available computing power of mobile devices, some reports propose methods to execute each layer of the network on different processors. DeepX [ 21 ] and Mosaic [ 25 ] distribute the execution of model layers to different computing resources such as the CPU, GPU, DSP, or neural processing unit (NPU). In contrast, layer [ 22 ] proposed the execution of a single neural network layer using both the CPU and GPU.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…The community has even started declaring victory in achieving human-level accuracies for certain ML-based tasks (e.g., image recognition and classification) [25,26]. Built upon such outstanding advances, the industry is now moving toward integrating the ML algorithms into various types of real-world applications and deploying the applications on the edge platforms [4,22,23,29,33,35,38,47,63,65]. In many real-world scenarios, the edge platforms often serve multiple purposes and have to handle different types of inference requests for different ML models at the same time.…”
Section: Inference In Edge Platformsmentioning
confidence: 99%