2021
DOI: 10.1145/3485486
|View full text |Cite
|
Sign up to set email alerts
|

Efficient automatic scheduling of imaging and vision pipelines for the GPU

Abstract: We present a new algorithm to quickly generate high-performance GPU implementations of complex imaging and vision pipelines, directly from high-level Halide algorithm code. It is fully automatic, requiring no schedule templates or hand-optimized kernels. We address the scalability challenge of extending search-based automatic scheduling to map large real-world programs to the deep hierarchies of memory and parallelism on GPU architectures in reasonable compile time. We achieve this using (1) a two-phase search… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(9 citation statements)
references
References 23 publications
0
9
0
Order By: Relevance
“…In order to partition and schedule pipelines, current approaches rely on designing cost models to steer design space exploration [1,3]. For instance, the auto-scheduler in [1] explores over ten thousand schedules for a single CNNlayer Halide [24] pipelines.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…In order to partition and schedule pipelines, current approaches rely on designing cost models to steer design space exploration [1,3]. For instance, the auto-scheduler in [1] explores over ten thousand schedules for a single CNNlayer Halide [24] pipelines.…”
Section: Introductionmentioning
confidence: 99%
“…Sophisticated cost models, some of them using ML-models themselves, have been proposed and used in [1,2,16,20,33,35,38]. These models, however, require extensive training for near-optimal solutions [3], are sensitive to changes in the execution environment (e.g., DVFS) and architectural parameters, need in-depth architectural knowledge for model updates, and do not consider the impact of heterogeneous or chiplet architectures. As heterogeneity at different levels of processing (e.g.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Transformations like loop reordering or loop fusion can have large effects on constant factors in the runtime of dense tensor programs [3]. However, the same transformations can have even larger asymptotic effects when tensors are sparse.…”
Section: Introductionmentioning
confidence: 99%
“…The user might need to schedule too many kernels, or may be unfamiliar with the intricacies of the tensor compiler in question. Sparse systems must follow the lead of dense systems, which are now moving towards automatic scheduling [2,3,27]. Automatic scheduling promises a realistic path towards integration into high-level systems like SciPy [38] or TensorFlow [1].…”
Section: Introductionmentioning
confidence: 99%