Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
With the recent switch in the design of general purpose processors from frequency scaling of a single processor core towards increasing the number of processor cores, parallel programming became important not only for scientific programming but also for general purpose programming. This also stressed the importance of programmability of existing parallel programming models which were primarily designed for performance. It was soon recognized that new programming models are needed that will make parallel programming possible not only to experts, but to a general programming community. Transactional Memory (TM) is an example which follows this premise. It improves dramatically over any previous synchronization mechanism in terms of programmability and composability, at the price of possibly reduced performance. The main source of performance degradation in Transactional Memory is the overhead of transactional execution. Our work on parallelizing Quake game engine is a clear example of this problem. We show that Software Transactional Memory is superior in terms of programmability compared to lock based programming, but that performance is hindered due to extreme amount of overhead introduced by transactional execution. In the meantime, a significant research effort has been invested in overcoming this problem. Our approach is aimed towards improving the performance of transactional code by reducing transactional data conflicts. The idea is based on the organization of the code in which highly conflicting data is promoted to dataflow tokens that coordinate the execution of transactions. The main contribution of this thesis is Atomic Dataflow model (ADF), a new task-based parallel programming model for C/C++ that integrates dataflow abstractions into the shared memory programming model. The ADF model provides language constructs that allow a programmer to delineate a program into a set of tasks and to explicitly define data dependencies for each task. The task dependency information is conveyed to the ADF runtime system that constructs a dataflow task graph that governs the execution of a program. Additionally, the ADF model allows tasks to share data. The key idea is that computation is triggered by dataflow between tasks but that, within a task, execution occurs by making atomic updates to common mutable state. To that end, the ADF model employs transactional memory, which guarantees atomicity of shared memory updates. The second contribution of this thesis is DaSH - the first comprehensive benchmark suite for hybrid dataflow and shared memory programming models. DaSH features 11 benchmarks, each representing one of the Berkeley dwarfs that capture patterns of communication and computation common to a wide range of emerging applications. DaSH includes sequential and shared-memory implementations based on OpenMP and TBB to facilitate easy comparison between hybrid dataflow implementations and traditional shared memory implementations. We use DaSH not only to evaluate the ADF model, but to also compare it with other two hybrid dataflow models in order to identify the advantages and shortcomings of such models, and motivate further research on their characteristics. Finally, we study applicability of hybrid dataflow models for parallelization of the game engine. We show that hybrid dataflow models decrease the complexity of the parallel game engine implementation by eliminating or restructuring the explicit synchronization that is necessary in shared memory implementations. The corresponding implementations also exhibit good scalability and better speedup than the shared memory parallel implementations, especially in the case of a highly congested game world that contains a large number of game objects. Ultimately, on an eight core machine we were able to obtain 4.72x speedup compared to the sequential baseline, and to improve 49% over the lock-based parallel implementation based on work-sharing. Con el reciente cambio en el dise帽o de los procesadores de prop贸sito general pasando del aumento de frecuencia al incremento del n煤mero de n煤cleos, la programaci贸n paralela se ha convertido en importante no solo para la comunidad cient铆fica sino tambi茅n para la programaci贸n en general. Este hecho ha enfatizado la importancia de la programabilidad de los modelos actuales de programaci贸n paralela, cuyo objetivo era el rendimiento. Pronto se observ贸 la necesidad de nuevos modelos de programaci贸n, para hacer factible la programaci贸n paralela a toda la comunidad. Transactional Memory (TM) es un ejemplo de dicho objetivo. Supone una gran mejora sobre cualquier m茅todo anterior de sincronizaci贸n en t茅rminos de programabilidad, con una posible reducci贸n del rendimiento como coste. La raz贸n principal de dicha degradaci贸n es el sobrecoste de la ejecuci贸n transaccional. Nuestro trabajo en la paralelizaci贸n del motor del juego Quake es un claro ejemplo de este problema. Demostramos que Software Transactional Memory es superior en t茅rminos de programabilidad a los modelos de programaci贸n basados en locks, pero que el rendimiento es entorpecido por el sobrecoste introducido por TM. Mientras tanto, se ha invertido un importante esfuerzo de investigaci贸n para superar dicho problema. Nuestra soluci贸n se dirige hacia la mejora del rendimiento del c贸digo transaccional reduciendo los conflictos con la informaci贸n contenida en las transacciones. La idea se basa en la organizaci贸n del c贸digo en el cual la informaci贸n conflictiva es promocionada a se帽ales del flujo de datos que coordinan la ejecuci贸n de las transacciones. La contribuci贸n principal de esta tesis es Atomic Dataflow Model (ADF), un nuevo modelo de programaci贸n para C/C++ basado en tareas que integra abstracciones de flujo de datos en el modelo de programaci贸n de la memoria compartida. El modelo ADF provee construcciones del lenguaje que permiten al programador la definici贸n del programa como un conjunto de tareas, adem谩s de la definici贸n expl铆cita de las dependencias de datos para cada tarea. La informaci贸n de dependencia de la tarea se transmite al runtime de ADF, que construye un grafo de tareas que es el que controla la ejecuci贸n de un programa. Adicionalmente, el modelo ADF permite que las tareas compartan informaci贸n. La idea principal es que la computaci贸n es activada por el flujo de datos entre tareas, pero que dentro de una tarea la ejecuci贸n ocurre haciendo actualizaciones at贸micas a un estado com煤n mutable. Para conseguir este fin, el modelo ADF utiliza TM, que garantiza la atomicidad en las modificaciones de la memoria compartida. La segunda contribuci贸n es DaSH, el primer conjunto de benchmarks para los modelos de programaci贸n de flujo de datos h铆bridos y los de memoria compartida. DaSH contiene 11 benchmarks, cada uno representativo de uno de los Berkeley dwarfs que captura patrones de comunicaciones y procesamiento comunes en un amplio rango de aplicaciones emergentes. DaSH incluye implementaciones secuenciales y de memoria compartida basadas en OpenMP y TBB que facilitan la comparaci贸n entre los modelos h铆bridos de flujo de datos e implementaciones de memoria compartida. Nosotros usamos DaSH no solo para evaluar ADF, sino tambi茅n para compararlo con otros dos modelos h铆bridos para identificar sus ventajas. Finalmente, estudiamos la aplicabilidad de dichos modelos h铆bridos para la paralelizaci贸n del motor del juego. Mostramos que disminuyen la complejidad de la implementaci贸n paralela, eliminando o reestructurando la sincronizaci贸n expl铆cita que es necesaria en las implementaciones de memoria compartida. Tambi茅n se observa una buena escalabilidad y una aceleraci贸n mejor, especialmente en el caso de un ambiente de juego muy cargado. En 煤ltima instancia, sobre una m谩quina con ocho n煤cleos se ha obtenido una aceleraci贸n del 4.72x comparado con el c贸digo secuencial, y una mejora del 49% sobre la implementaci贸n paralela basada en locks.
With the recent switch in the design of general purpose processors from frequency scaling of a single processor core towards increasing the number of processor cores, parallel programming became important not only for scientific programming but also for general purpose programming. This also stressed the importance of programmability of existing parallel programming models which were primarily designed for performance. It was soon recognized that new programming models are needed that will make parallel programming possible not only to experts, but to a general programming community. Transactional Memory (TM) is an example which follows this premise. It improves dramatically over any previous synchronization mechanism in terms of programmability and composability, at the price of possibly reduced performance. The main source of performance degradation in Transactional Memory is the overhead of transactional execution. Our work on parallelizing Quake game engine is a clear example of this problem. We show that Software Transactional Memory is superior in terms of programmability compared to lock based programming, but that performance is hindered due to extreme amount of overhead introduced by transactional execution. In the meantime, a significant research effort has been invested in overcoming this problem. Our approach is aimed towards improving the performance of transactional code by reducing transactional data conflicts. The idea is based on the organization of the code in which highly conflicting data is promoted to dataflow tokens that coordinate the execution of transactions. The main contribution of this thesis is Atomic Dataflow model (ADF), a new task-based parallel programming model for C/C++ that integrates dataflow abstractions into the shared memory programming model. The ADF model provides language constructs that allow a programmer to delineate a program into a set of tasks and to explicitly define data dependencies for each task. The task dependency information is conveyed to the ADF runtime system that constructs a dataflow task graph that governs the execution of a program. Additionally, the ADF model allows tasks to share data. The key idea is that computation is triggered by dataflow between tasks but that, within a task, execution occurs by making atomic updates to common mutable state. To that end, the ADF model employs transactional memory, which guarantees atomicity of shared memory updates. The second contribution of this thesis is DaSH - the first comprehensive benchmark suite for hybrid dataflow and shared memory programming models. DaSH features 11 benchmarks, each representing one of the Berkeley dwarfs that capture patterns of communication and computation common to a wide range of emerging applications. DaSH includes sequential and shared-memory implementations based on OpenMP and TBB to facilitate easy comparison between hybrid dataflow implementations and traditional shared memory implementations. We use DaSH not only to evaluate the ADF model, but to also compare it with other two hybrid dataflow models in order to identify the advantages and shortcomings of such models, and motivate further research on their characteristics. Finally, we study applicability of hybrid dataflow models for parallelization of the game engine. We show that hybrid dataflow models decrease the complexity of the parallel game engine implementation by eliminating or restructuring the explicit synchronization that is necessary in shared memory implementations. The corresponding implementations also exhibit good scalability and better speedup than the shared memory parallel implementations, especially in the case of a highly congested game world that contains a large number of game objects. Ultimately, on an eight core machine we were able to obtain 4.72x speedup compared to the sequential baseline, and to improve 49% over the lock-based parallel implementation based on work-sharing. Con el reciente cambio en el dise帽o de los procesadores de prop贸sito general pasando del aumento de frecuencia al incremento del n煤mero de n煤cleos, la programaci贸n paralela se ha convertido en importante no solo para la comunidad cient铆fica sino tambi茅n para la programaci贸n en general. Este hecho ha enfatizado la importancia de la programabilidad de los modelos actuales de programaci贸n paralela, cuyo objetivo era el rendimiento. Pronto se observ贸 la necesidad de nuevos modelos de programaci贸n, para hacer factible la programaci贸n paralela a toda la comunidad. Transactional Memory (TM) es un ejemplo de dicho objetivo. Supone una gran mejora sobre cualquier m茅todo anterior de sincronizaci贸n en t茅rminos de programabilidad, con una posible reducci贸n del rendimiento como coste. La raz贸n principal de dicha degradaci贸n es el sobrecoste de la ejecuci贸n transaccional. Nuestro trabajo en la paralelizaci贸n del motor del juego Quake es un claro ejemplo de este problema. Demostramos que Software Transactional Memory es superior en t茅rminos de programabilidad a los modelos de programaci贸n basados en locks, pero que el rendimiento es entorpecido por el sobrecoste introducido por TM. Mientras tanto, se ha invertido un importante esfuerzo de investigaci贸n para superar dicho problema. Nuestra soluci贸n se dirige hacia la mejora del rendimiento del c贸digo transaccional reduciendo los conflictos con la informaci贸n contenida en las transacciones. La idea se basa en la organizaci贸n del c贸digo en el cual la informaci贸n conflictiva es promocionada a se帽ales del flujo de datos que coordinan la ejecuci贸n de las transacciones. La contribuci贸n principal de esta tesis es Atomic Dataflow Model (ADF), un nuevo modelo de programaci贸n para C/C++ basado en tareas que integra abstracciones de flujo de datos en el modelo de programaci贸n de la memoria compartida. El modelo ADF provee construcciones del lenguaje que permiten al programador la definici贸n del programa como un conjunto de tareas, adem谩s de la definici贸n expl铆cita de las dependencias de datos para cada tarea. La informaci贸n de dependencia de la tarea se transmite al runtime de ADF, que construye un grafo de tareas que es el que controla la ejecuci贸n de un programa. Adicionalmente, el modelo ADF permite que las tareas compartan informaci贸n. La idea principal es que la computaci贸n es activada por el flujo de datos entre tareas, pero que dentro de una tarea la ejecuci贸n ocurre haciendo actualizaciones at贸micas a un estado com煤n mutable. Para conseguir este fin, el modelo ADF utiliza TM, que garantiza la atomicidad en las modificaciones de la memoria compartida. La segunda contribuci贸n es DaSH, el primer conjunto de benchmarks para los modelos de programaci贸n de flujo de datos h铆bridos y los de memoria compartida. DaSH contiene 11 benchmarks, cada uno representativo de uno de los Berkeley dwarfs que captura patrones de comunicaciones y procesamiento comunes en un amplio rango de aplicaciones emergentes. DaSH incluye implementaciones secuenciales y de memoria compartida basadas en OpenMP y TBB que facilitan la comparaci贸n entre los modelos h铆bridos de flujo de datos e implementaciones de memoria compartida. Nosotros usamos DaSH no solo para evaluar ADF, sino tambi茅n para compararlo con otros dos modelos h铆bridos para identificar sus ventajas. Finalmente, estudiamos la aplicabilidad de dichos modelos h铆bridos para la paralelizaci贸n del motor del juego. Mostramos que disminuyen la complejidad de la implementaci贸n paralela, eliminando o reestructurando la sincronizaci贸n expl铆cita que es necesaria en las implementaciones de memoria compartida. Tambi茅n se observa una buena escalabilidad y una aceleraci贸n mejor, especialmente en el caso de un ambiente de juego muy cargado. En 煤ltima instancia, sobre una m谩quina con ocho n煤cleos se ha obtenido una aceleraci贸n del 4.72x comparado con el c贸digo secuencial, y una mejora del 49% sobre la implementaci贸n paralela basada en locks.
Abstract. The inherent difficulty of thread-based shared-memory programming has recently motivated research in high-level, task-parallel programming models. Recent advances of Task-Parallel models add implicit synchronization, where the system automatically detects and satisfies data dependencies among spawned tasks. However, dynamic dependence analysis incurs significant runtime overheads, because the runtime must track task resources and use this information to schedule tasks while avoiding conflicts and races. We present SCOOP, a compiler that effectively integrates static and dynamic analysis in code generation. SCOOP combines context-sensitive points-to, controlflow, escape, and effect analyses to remove redundant dependence checks at runtime. Our static analysis can work in combination with existing dynamic analyses and task-parallel runtimes that use annotations to specify tasks and their memory footprints. We use our static dependence analysis to detect non-conflicting tasks and an existing dynamic analysis to handle the remaining dependencies. We evaluate the resulting hybrid dependence analysis on a set of task-parallel programs.
By introducing asynchronous lambdas, many programming languages have leaped ahead in the race for programmable manycore systems, leaving the operating system and its scheduler behind. Instead of hiding application-inherent parallelism behind pools of threads with opaque behavior, asynchronous lambdas allow programmers to explicitly state which parts of a program can be executed in parallel and when this form of parallelism is available. Introducing stretch as a universal performance metric and externalizing part of the lambda-provided knowledge not only to the runtime but also to the operating system scheduler, this paper tries to lay the foundation for OS scheduling to catch up on the road towards heterogeneous elastic manycore systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright 漏 2024 scite LLC. All rights reserved.
Made with 馃挋 for researchers
Part of the Research Solutions Family.