Multiple-input multiple-output (MIMO) systems have attracted considerable attention in wireless communications because they offer a significant increase in data throughput and link coverage without additional bandwidth requirement or increased transmit power. The price that has to be paid is the increased complexity of hardware components and algorithms. The sphere detector (SD) algorithm solves the problem of maximum likelihood (ML) detection for MIMO channels by significantly reducing the search space of possible solutions. The main drawback of the SD algorithm is in its sequential nature, consequently, running it on massively parallel architectures (MPAs) is very inefficient. In order to overcome the drawbacks of the SD algorithm, a new parallel sphere detector (PSD) algorithm is proposed. It implements a novel hybrid tree search method, where the algorithm parallelism is assured by the efficient combination of depth-first search and breadth-first search algorithms. A path metric-based parallel sorting is employed at each intermediate stage. The PSD algorithm is able to adjust its memory requirements and extent of parallelism to fit a wide range of parallel architectures. Mapping details for MPAs are proposed by giving the details of thread dependent, highly parallel building blocks of the algorithm. Based on the building blocks proposed, a mapping to general-purpose graphics processing unit is provided, and its performance is evaluated. In order to achieve high-throughput, several levels of parallelism are introduced, and different scheduling strategies are considered.In the first approach the robustness of MIMO is maximized, that is, the probability of error is minimized with the use of space-time codes (STCs). STCs rely on transmitting different representations of the same data stream on different parallel transmit branches, that is, it introduces controlled redundancy in both space and time.Spatial Multiplexing (SM), the second approach, focuses on maximizing the capacity of a radio link by transmitting independent data streams on different transmit branches simultaneously and within the same frequency band. The price that has to be paid is the increased complexity of detection hardware components and algorithms. The complexity of detection algorithms depends on many factors, such as antenna configuration, modulation order, channel, and coding.With regard to the bit error rate (BER) performance, the maximum likelihood (ML) detector offers the best BER performance; however, its exponential complexity is not suitable for real-time applications. The SD algorithm has been proposed in the literature to significantly reduce the search space of possible solutions while still providing the ML solution. For a few good examples, refer to [2-4] and [5].In non-optimal detectors, the complexity of the sphere detector (SD) algorithm is reduced by introducing some approximations such as (i) early termination of the search, (ii) introducing constraints on the maximum number of nodes that the detector algorithm is allowed ...