Recently, a vast amount of satellite data has become available, going beyond standard optical (EO) data to other forms such as synthetic aperture radars (SAR). While more robust, SAR data are often more difficult to interpret, can be of lower resolution, and require intense pre-processing compared to EO data. On the other hand, while more interpretable, EO data often fail under unfavourable lighting, weather, or cloud-cover conditions. To leverage the advantages of both domains, we present a novel autoencoder-based architecture that is able to both (i) fuse multi-spectral optical and radar data in a common shared-space, and (ii) perform image segmentation for building footprint detection under the assumption that one of the data modalities is missing-resembling a situation often encountered under real-world settings. To do so, a novel randomized skip-connection architecture that utilizes autoencoder weight-sharing is designed. We compare the proposed method to baseline approaches relying on network fine-tuning, and established architectures such as UNet. Qualitative and quantitative results show the merits of the proposed method, that outperforms all compared techniques for the task-at-hand.