An enhanced framework for peak-to-average power ratio (PAPR) reduction and waveform design for Multiple-Input-Multiple-Output (MIMO) orthogonal frequency-division multiplexing (OFDM) systems, based on a convolutional-autoencoder (CAE) architecture, is presented. The end-to-end learning-based autoencoder (AE) for communication networks represents the network by an encoder and decoder, where in between, the learned latent representation goes through a physical communication channel.We introduce a joint learning scheme based on projected gradient descent iteration to optimize the spectral mask behavior and MIMO detection under the influence of a non-linear high power amplifier (HPA) and a multipath fading channel. The offered efficient implementation novel waveform design technique utilizes only a single PAPR reduction block for all antennas. It is throughput-lossless, as no side information is required at the decoder. Performance is analyzed by examining the bit error rate (BER), the PAPR, and the spectral response and compared with classical PAPR reduction MIMO detector methods on 5G simulated data. The suggested system exhibits competitive performance when considering all optimization criteria simultaneously. We apply gradual loss learning for multi-objective optimization and show empirically that a single trained model covers the tasks of PAPR reduction, spectrum design, and MIMO detection together over a wide range of SNR levels.