A large majority of cellular networks deployed today make use of Frequency Division Duplexing (FDD) where, in contrast with Time Division Duplexing (TDD), the channel reciprocity does not hold and explicit downlink (DL) probing and uplink (UL) feedback are required in order to achieve spatial multiplexing gain. In order to support massive MIMO, i.e., a very large number of antennas at the base station (BS) side, the overhead incurred by conventional DL probing and UL feedback schemes scales linearly with the number of BS antennas and, therefore, may be very large. In this paper, we present a new approach to achieve a very competitive tradeoff between spatial multiplexing gain and probing-feedback overhead in such systems. Our approach is based on two novel methods: (i) an efficient regularization technique based on Deep Neural Networks (DNN) that learns the Angular Spread Function (ASF) of users channels and permits to estimate the DL covariance matrix from the noisy i.i.d. channel observations obtained freely via UL pilots (UL-DL covariance transformation), (ii) a novel "sparsifying precoding" technique that uses the estimated DL covariance matrix from (i) and imposes a controlled sparsity on the DL channel such that given any assigned DL pilot dimension, it is able to find an optimal sparsity level and a corresponding sparsifying precoder for which the "effective" channel vectors after sparsification can be estimated at the BS with a low mean-square error. We compare our proposed DNN-based method in (i) with other methods in the literature via numerical simulations and show that it yields a very competitive performance. We also compare our sparsifying precoder in (ii) with the state-of-the-art statistical beamforming methods under the assumption that those methods also have access to the covariance knowledge in the DL and show that our method yields higher spectral efficiency since it uses in addition the instantaneous channel information after sparsification.