For practical consideration, efficient and stable implementation of the promising vertical Bell Labs Layered Space-Time system is highly desirable. The original square-root algorithm (SRA), proposed by Hassibi, for the minimum mean square error detection, is composed of three stages, that is, initialization, ordering, and nulling. Unlike the original SRA whose initialization stage is not completely based on unitary transformations, the three stages of our proposed algorithm #1 are all based on unitary transformations. The average number of multiplications required by our proposed algorithm #1 is about .29=10/M 3 , where both the numbers of transmit and receive antennas are equal to M . In the meantime, the average number of multiplications required by the original SRA is .17=3/M 3 . In addition to the stable initialization, ordering, and nulling considered by the proposed algorithm #1, our proposed algorithm #2 considers stable detection. Our proposed algorithm #2 is completely based on Givens rotations, which is to be implemented by COordinate Rotation DIgital Computer-based hardwares. TWO EFFICIENT AND STABLE MMSE DETECTION 237 of the algorithms in [2] and [3] do not use unitary transformations. The fast algorithm of [4] does not use unitary transformations at all. These algorithms may suffer from numerical instability when the numbers of transmit and receive antennas are large.On the basis of the steps of the original SRA, we propose in this paper several new results to improve the original algorithm from the perspectives of numerical stability and computational efficiency. Our numerically stable proposed algorithm #1 is based completely on either Householder transformations or Givens rotations. As for the computational efficiency, we use the number of complex multiplications as our efficiency measure. The number of multiplications required by the algorithms of [2] and [3] is fixed, whereas the number of multiplications that our proposed algorithm #1 requires may vary with the channel matrix. Hence, we express the efficiency of our proposed algorithm #1 in terms of the average number of multiplications. With the aid of computer simulations, we show that our proposed algorithm #1 with initial order requires averagely the least number of complex multiplications.Whereas the proposed algorithm #1 uses the same detection as that in the original SRA [5] ‡ , our proposed algorithm #2 considers stable detection that is based on unitary transformations. Our stable detection is motivated by the general decision feedback equalizer [6] and the transformation-based multiuser detection [7]. However, neither of them considers optimal detection order when detecting symbols. We introduce the proposed algorithm #2 completely in terms of Givens rotations, which means that it can be implemented by COordinate Rotation DIgital Computer (CORDIC) algorithms [8] to produce efficient and stable hardwares.The structure of this paper is as follows. The original SRA is introduced in Section 2. Our new computations are given in Section 3. ...