Distributed video coding (DVC) is a coding paradigm which exploits the redundancy of the source (video) at the decoder side, as opposed to predictive coding, where the encoder leverages the redundancy. To exploit the correlation between views, multiview predictive video codecs require the encoder to have the various views available simultaneously. However, in multiview DVC (M-DVC), the decoder can still exploit the redundancy between views, avoiding the need for inter-camera communication. The key element of every DVC decoder is the side information (SI), which can be generated by leveraging intra-view or inter-view redundancy for multiview video data. In this paper, a novel learning-based fusion technique is proposed, which is able to robustly fuse an inter-view SI and an intra-view (temporal) SI. An inter-view SI generation method capable of identifying occluded areas is proposed and is coupled with a robust fusion system able to improve the quality of the fused SI along the decoding process through a learning process using already decoded data. We shall here take the approach to fuse the estimated distributions of the SIs as opposed to a conventional fusion algorithm based on the fusion of pixel values. The proposed solution is able to achieve gains up to 0.9 dB in Bjøntegaard difference when compared with the best-performing (in a RD sense) single SI DVC decoder, chosen as the best of an inter-view and a temporal SI-based decoder one.