Deep learning methods have become ubiquitous tools in many Earth observation applications, delivering stateof-the-art results while proving to generalize for a variety of scenarios. One such domain concerns the Sentinel-2 (S2) satellite mission, which provides multispectral images in the form of 13 spectral bands, captured at three different spatial resolutions: 10, 20, and 60 m. This research aims to provide a super-resolution mechanism based on fully convolutional neural networks (CNNs) for upsampling the low-resolution (LR) spectral bands of S2 up to 10-m spatial resolution. Our approach is centered on attaining good performance with respect to two main properties: consistency and synthesis. While the synthesis evaluation, also known as Wald's protocol, has spoken for the performance of almost all previously introduced methods, the consistency property has been overlooked as a viable evaluation procedure. Recently introduced techniques make use of sensor's modulation transfer function (MTF) to learn an approximate inverse mapping from LR to high-resolution images, which is on a direct path for achieving a good consistency value. To this end, we propose a multiobjective loss for training our architectures, including an MTF-based mechanism, a direct input-output mapping using synthetically degraded data, along with direct similarity measures between high-frequency details from already available 10-m bands, and super-resolved images. Experiments indicate that our method is able to achieve a good tradeoff between consistency and synthesis properties, along with competitive visual quality results.