The performance of SIMD processors is often limited by the time it takes to transfer data between the centralized control unit and the parallel processor array. This is especially true of hybrid SIMD models, such as associative computing, that make extensive use of global search operations. Pipelining instruction broadcast can help, but is not enough to solve the problem, especially for massively parallel processors with thousands of processing elements. In this paper, we describe a SIMD processor architecture that combines a fully pipelined broadcast/reduction network with hardware multithreading to reduce performance degradation as the number of processors is scaled up.
Abstract-This paper proposes a solution to air traffic control (ATC) using an enhanced SIMD machine model called an Associative Processor (AP). Our solution differs from previous ATC systems that are designed for MIMD computers and have a great deal of difficulty meeting the predictability requirements for ATC, which are critical for meeting the strict certification standards required for safety critical software components. The proposed AP solution supports accurate predictions of worst case execution times and guarantees all deadlines are met. Furthermore, the software developed based on the AP model is much simpler and smaller in size than the current corresponding ATC software. As the associative processor is built from SIMD hardware, it is considerably cheaper and simpler than the MIMD hardware currently used to support ATC. We have designed a prototype for eight ATC real-time tasks on ClearSpeed CSX600 accelerator that is used to emulate AP. Performance is evaluated in terms of execution time and predictability and is compared to the fastest host-only version implemented using OpenMP on an 8-core multiprocessor (MIMD). Our extensive experiments show that the AP implementation meets all deadlines that can be statically scheduled. To the contrary, some tasks miss their deadlines when implemented on MIMD. It is shown that the proposed AP solution will support accurate and meaningful predictions of worst case execution times and will guarantee that all deadlines are met.
The latency of broadcast/reduction operations has a significant impact on the performance of SIMD processors. This is especially true for associative programs, which make extensive use of global search operations. Previously, we developed a prototype associative SIMD processor that uses hardware multithreading to overcome the broadcast/reduction latency. In this paper we show, through simulations of the processor running an associative program, that hardware multithreading is able to improve performance by increasing system utilization, even for processors with hundreds or thousands of processing elements. However, the choice of thread scheduling policy used by the hardware is critical in determining the actual utilization achieved. We consider three thread scheduling policies and show that a thread scheduler that avoids issuing threads that will stall due to pipeline dependencies or thread synchronization operations is able to maintain system utilization independent of the number of threads.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.