A paradigm shift is underway in Earth Observation, as deep learning replaces other methods for many predictive tasks. Nevertheless, most deep learning classification models for Earth Observation are limited by their specificity with respect to both the sensors used (inputs) and classes predicted (outputs), leading to models which only perform well for specific satellites and on specific datasets. Cloud masking is typical of this, but is one of the most important tasks to generalise across sensors, given that it is required for all optical instruments. This work sets out a framework to relax deep learning's constraints on specific inputs and outputs, using cloud and shadow masking as a case-study. Centrally, a model which is sensor independent, and which can simultaneously learn from different labelling schemes is developed. The model, Spectral ENcoder for SEnsor Independence version 2 (SEnSeI-v2) extends the original version, by permitting multimodal data (in this case Sentinel-1 SAR imagery and a DEM) to be ingested, along with several other architectural improvements. SEnSeI-v2, attached to SegFormer, is shown to have state-of-the-art performance, whilst being usable on a range of multispectral band combinations, alongside SAR and DEM inputs, without retraining. The labelling schemes of eight datasets are not made compatible through a reductive approach (e.g. converting to cloud vs. non-cloud), rather, an ambiguous cross-entropy loss is introduced that allows the model to learn from the different labelling schemes without sacrificing the class distinctions of each, leading to a model which predicts all of the constituent classes of the different datasets.