Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems 2022
DOI: 10.1145/3503222.3507723
|View full text |Cite
|
Sign up to set email alerts
|

AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures

Abstract: This work reveals that memory-intensive computation is a rising performance-critical factor in recent machine learning models. Due to a unique set of new challenges, existing ML optimizing compilers cannot perform efficient fusion under complex two-level dependencies combined with just-in-time demand. They face the dilemma of either performing costly fusion due to heavy redundant computation, or skipping fusion which results in massive number of kernels. Furthermore, they often suffer from low parallelism due … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 38 publications
(4 citation statements)
references
References 46 publications
0
4
0
Order By: Relevance
“…On ThunderX2, nDirect delivers slightly lower performance for the end-to-end inference compared to Ansor, with a speedup of 0.88× to 0.98×. The better performance of Ansor on the whole CNN is due to its ability to optimize across CNN layers through operator fusion [67,72]. This technique can write back operations for intermediate results and fetch operations in the CNNs pipeline, further reducing memory access latency and bandwidth pressure to improve CNNs end-toend performance.…”
Section: End-to-end Inference Timementioning
confidence: 99%
“…On ThunderX2, nDirect delivers slightly lower performance for the end-to-end inference compared to Ansor, with a speedup of 0.88× to 0.98×. The better performance of Ansor on the whole CNN is due to its ability to optimize across CNN layers through operator fusion [67,72]. This technique can write back operations for intermediate results and fetch operations in the CNNs pipeline, further reducing memory access latency and bandwidth pressure to improve CNNs end-toend performance.…”
Section: End-to-end Inference Timementioning
confidence: 99%
“…We use blocks as the granularity to add exits, because the block is the basic unit/step to compress information to a low dimensional space mathematically [13]. In addition, many graph optimizations are done between layers in the same block [14,53]. Adding an exit in a block invalidates the graph optimization, and incurs the high cost of data movement between stages, as there are high data dependencies within a block.…”
Section: Building a Multi-exit Modelmentioning
confidence: 99%
“…Mlmodelci [20] provided a one-stop platform for multimedia developers to provide efficient ML services, while NSML [21] created a collaborative environment for users to deploy their own commercial service. AStitch [22] improved the execution efficiency of ML tasks by means of compilation optimization, and avoided unnecessary repeated calculation. FedAMP [23] promoted pairwise collaboration between clients with similar data to significantly improve federated learning performance.…”
Section: Related Workmentioning
confidence: 99%