This paper presents an optimized low-complexity and high-throughput multiple-input multiple-output ( MIMO) signal detector core for detecting spatially-multiplexed data streams. The core architecture supports various layer configurations up to 4, while achieving near-optimal performance, as well as configurable modulation constellations up to 256-QAM on each layer. The core is capable of operating as a soft-input soft-output log-likelihood ratio (LLR) MIMO detector which can be used in the context of iterative detection and decoding. High area-efficiency is achieved via algorithmic and architectural optimizations performed at two levels. First, distance computations and slicing operations for an optimal 2-layer maximum a posteriori (MAP) MIMO detector are optimized to eliminate the use of multipliers and reduce the overhead of slicing in the presence of soft-input LLRs. We show that distances can be easily computed using elementary addition operations, while optimal slicing is done via efficient comparisons with soft decision boundaries, resulting in a simple feed-forward pipelined architecture. Second, to support more layers, an efficient channel decomposition scheme is presented that reduces the detection of multiple layers into multiple 2-layer detection subproblems, which map onto the 2-layer core with a slight modification using a distance accumulation stage and a post-LLR processing stage. Various architectures are accordingly developed to achieve a desired detection throughput and run-time reconfigurability by time-multiplexing of one or more component cores. The proposed core is applied as well to design an optimal multi-user MIMO detector for LTE. The core occupies an area of 1.58 MGE and achieves a throughput of 733 Mbps for 256-QAM when synthesized in 90 nm CMOS.