Efficient automatic scheduling of imaging and vision pipelines for the GPU

Anderson, Luke; Adams, Andrew; Ma, Karima; Li, Tzu‐Mao; Tian, Jie; Ragan-Kelley, Jonathan

doi:10.1145/3485486

Cited by 13 publications

(9 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In order to partition and schedule pipelines, current approaches rely on designing cost models to steer design space exploration [1,3]. For instance, the auto-scheduler in [1] explores over ten thousand schedules for a single CNNlayer Halide [24] pipelines.…”

Section: Introductionmentioning

confidence: 99%

“…Sophisticated cost models, some of them using ML-models themselves, have been proposed and used in [1,2,16,20,33,35,38]. These models, however, require extensive training for near-optimal solutions [3], are sensitive to changes in the execution environment (e.g., DVFS) and architectural parameters, need in-depth architectural knowledge for model updates, and do not consider the impact of heterogeneous or chiplet architectures. As heterogeneity at different levels of processing (e.g.…”

Section: Introductionmentioning

confidence: 99%

“…The configuration consists of the number of pipeline stages, CNN layers per pipeline stage and a mapping of pipeline stages to EPs. In the literature, various stochastic optimization and machine learning algorithms have been used such as Simulated Annealing [38], evolutionary algorithms [1,30], reinforcement learning [2,23] and deep neural network techniques [3]. The design space under consideration is large and complex, requiring tens of thousands of trials in order to reach a near optimum with current search schemes.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Shisha: Online scheduling of CNN pipelines on heterogeneous architectures

Soomro¹,

Abduljabbar²,

Castrillón³

et al. 2022

Preprint

View full text Add to dashboard Cite

Chiplets have become a common methodology in modern chip design. Chiplets improve yield and enable heterogeneity at the level of cores, memory subsystem and the interconnect. Convolutional Neural Networks (CNNs) have high computational, bandwidth and memory capacity requirements owing to the increasingly large amount of weights. Thus to exploit chiplet-based architectures, CNNs must be optimized in terms of scheduling and workload distribution among computing resources. We propose Shisha, an online approach to generate and schedule parallel CNN pipelines on chiplet architectures. Shisha targets heterogeneity in compute performance and memory bandwidth and tunes the pipeline schedule through a fast online exploration technique. We compare Shisha with Simulated Annealing, Hill Climbing and Pipe-Search. On average, the convergence time is improved by ∼ 35× in Shisha compared to other exploration algorithms. Despite the quick exploration, Shisha's solution is often better than that of other heuristic exploration algorithms.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Shisha: Online scheduling of CNN pipelines on heterogeneous architectures

Soomro¹,

Abduljabbar²,

Castrillón³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Transformations like loop reordering or loop fusion can have large effects on constant factors in the runtime of dense tensor programs [3]. However, the same transformations can have even larger asymptotic effects when tensors are sparse.…”

Section: Introductionmentioning

confidence: 99%

“…The user might need to schedule too many kernels, or may be unfamiliar with the intricacies of the tensor compiler in question. Sparse systems must follow the lead of dense systems, which are now moving towards automatic scheduling [2,3,27]. Automatic scheduling promises a realistic path towards integration into high-level systems like SciPy [38] or TensorFlow [1].…”

Section: Introductionmentioning

confidence: 99%

An Asymptotic Cost Model for Autoscheduling Sparse Tensor Programs

Ahrens¹,

Kjølstad²,

Amarasinghe³

2021

Preprint

View full text Add to dashboard Cite

While loop reordering and fusion can make big impacts on the constant-factor performance of dense tensor programs, the effects on sparse tensor programs are asymptotic, often leading to orders of magnitude performance differences in practice. Sparse tensors also introduce a choice of compressed storage formats that can have asymptotic effects. Research into sparse tensor compilers has led to simplified languages that express these tradeoffs, but the user is expected to provide a schedule that makes the decisions. This is challenging because schedulers must anticipate the interaction between sparse formats, loop structure, potential sparsity patterns, and the compiler itself. Automating this decision making process stands to finally make sparse tensor compilers accessible to end users.We present, to the best of our knowledge, the first automatic asymptotic scheduler for sparse tensor programs. We provide an approach to abstractly represent the asymptotic cost of schedules and to choose between them. We narrow down the search space to a manageably small "Pareto frontier" of asymptotically undominated kernels. We test our approach by compiling these kernels with the TACO sparse tensor compiler and comparing them with those generated with the default TACO schedules. Our results show that our approach reduces the scheduling space by orders of magnitude and that the generated kernels perform asymptotically better than those generated using the default schedules.

show abstract

Building a Join Optimizer for Soufflé

Arch

Zhao

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Efficient automatic scheduling of imaging and vision pipelines for the GPU

Cited by 13 publications

References 23 publications

Shisha: Online scheduling of CNN pipelines on heterogeneous architectures

Shisha: Online scheduling of CNN pipelines on heterogeneous architectures

An Asymptotic Cost Model for Autoscheduling Sparse Tensor Programs

Building a Join Optimizer for Soufflé

Contact Info

Product

Resources

About