Dejice Jacob scite author profile

Singer

et al. 2016

Abstract-We describe an approach for developing a campuswide sensor network using commodity single board computers. We sketch various use cases for environmental sensor data, for different university stakeholders. Our key premise is that supersensors-sensors with significant compute capabilityenable more flexible data collection, processing and reaction. In this paper, we describe the initial prototype deployment of our supersensor system in a single department at the University of Glasgow.

Parsing Protocol Standards to Parse Standard Protocols

McQuistin

Band

et al. 2020

ALPyNA: acceleration of loops in Python for novel architectures

Singer

2019

We present ALPyNA, an automatic loop parallelization framework for Python, which analyzes data dependences within nested loops and dynamically generates CUDA kernels for GPU execution. The ALPyNA system applies classical dependence analysis techniques to discover and exploit potential parallelism. The skeletal structure of the dependence graph is determined statically (if possible) or at runtime; this is combined with type and bounds information discovered at runtime, to auto-generate high-performance kernels for offload to GPU.We demonstrate speedups of up to 1000x relative to the native CPython interpreter across four array-intensive numerical Python benchmarks. Performance improvement is related to both iteration domain size and dependence graph complexity. Nevertheless, this approach promises to bring the benefits of manycore parallelism to application developers.

Intelligent Traffic Control System

Bilal

2007

Python programmers have GPUs too: automatic Python loop parallelization with staged dependence analysis

Trinder

Singer

2019

Python is a popular language for end-user software development in many application domains. End-users want to harness parallel compute resources effectively, by exploiting commodity manycore technology including GPUs. However, existing approaches to parallelism in Python are esoteric, and generally seem too complex for the typical end-user developer. We argue that implicit, or automatic, parallelization is the best way to deliver the benefits of manycore to end-users, since it avoids domain-specific languages, specialist libraries, complex annotations or restrictive language subsets. Autoparallelization fits the Python philosophy, provides effective performance, and is convenient for non-expert developers. Despite being a dynamic language, we show that Python is a suitable target for auto-parallelization. In an empirical study of 3000+ open-source Python notebooks, we demonstrate that typical loop behaviour 'in the wild' is amenable to auto-parallelization. We show that staging the dependence analysis is an effective way to maximize performance. We apply classical dependence analysis techniques, then leverage the Python runtime's rich introspection capabilities to resolve additional loop bounds and variable types in a just-intime manner. The parallel loop nest code is then converted to CUDA kernels for GPU execution. We achieve orders of magnitude speedup over baseline interpreted execution and some speedup (up to 50x, although not consistently) over CPU JIT-compiled execution, across 12 loop-intensive standard benchmarks. CCS Concepts • Software and its engineering → Dynamic compilers; Scripting languages; Parallel programming languages; • Computer systems organization → Heterogeneous (hybrid) systems.