INTRODUCTIONRapid advancements in multicore and chip-level multithreading technologies open new challenges and make multicore and manycore systems a part of the computing landscape. From high-end servers to mobile phones, multicores and manycores are steadily entering every single aspect of the information technology.However, most programmers are trained in sequential programming, yet most existing parallel programming models are prone to errors such as data race and deadlock. Therefore, to fully use multicore and manycore hardware, parallel programming models that allow easy transition of sequential programs to parallel programs with good performance and enable development of error-free codes are urgently needed.
THEMES OF THIS SPECIAL ISSUEThis special issue contains research papers addressing the state-of-the-art technologies related to multicore and manycore systems. The set of accepted papers can be organized under the following key themes: Programming Models, Performance Improvements, and Applications.
Programming modelsThere are several developments in programming models that allow automated parallelization of code, and eliminate, or at least detect, programming errors such as data race. The paper 1 proposes a model where a function calls other functions by using communication channels. 1 This completely eliminates passing states with the callee functions, making the results deterministic. As a result, the underlying hardware can automate parallelization of the code by spawning these callee functions as tasks running concurrently with the parent function as hardware cores become available. In this way, this model allows automatic, data race-free parallelization of existing applications that can scale well on manycore hardware.As the multicore and manycore systems proliferate in the market, it is common to parallelize existing applications with shared memory models, where access of shared variables between threads are managed by synchronization primitives and/or lock-free data mechanisms. However, it is challenging to use these interfaces appropriately. As a result, data race can often happen, which are difficult to detect and reproduce. Race detectors such as Intel Cilkscreen can be used to detect data race, but they often introduce performance penalties, and give false positives if they are unaware of the underlying lock-free structure semantics. To mitigate this issue, the paper 2 extends the race detector ThreadSanitizer, with the semantic of 2 lock-free data structures: the Single-Producer/Single-Consumer (SPSC) and the Multiple-Producer/Multiple-Consumer (MPMC) queues. Experimental results demonstrate that these improvements eliminated 60% of the false-positive warning and can accurately detect the wrong use of these data race-free structures. 2To improve programmability over manycore architectures, several high-level programming models are proposed, such as Kokkos, RAJA, OpenACC, and OpenMP 4.0. The paper 3 benchmarks these programming models against mature low-level programming models CUDA and Ope...