Even though the sum capacity of decentralized schemes, either at transmission or reception, has already been computed, in the literature, there is a lack of practical implementation schemes that achieve this sum capacity. This paper presents a novel precoder design for the multi-antenna Broadcast Channel (BC) in order to attain the BC sum capacity. It is a non-iterative BC architecture, which first implements the so-called embedded power loading. This consists in including the equivalent pointto-point (PtP) multiple-input-multiple-output (MIMO) optimal power waterfilling and eigenbeamforming. As the resulting equivalent channel is not diagonal and the decentralized receivers cannot cooperate to cancel the interference, as in the PtP MIMO case, the rest of the precoder is devoted to reproduce residual interference of the optimal dual MAC, which has been previously computed. In order not to introduce additional power, only orthogonal transformations together with a THP are used for that. A numerical example illustrates the proposed new strategy and implementation.