Airborne remote sensing observations over the tropical Atlantic Ocean upstream of Barbados are used to characterize trade wind shallow cumulus clouds and to benchmark two cloud-resolving ICON (ICOsahedral Nonhydrostatic) model simulations at kilo-and hectometer scales. The clouds were observed by an airborne nadir pointing backscatter lidar, a cloud radar, and a microwave radiometer in the tropical dry winter season during daytime. For the model benchmark, forward operators convert the model data into the observational space for considering instrument specific cloud detection thresholds. The forward 5 simulations reveal the different detection limits of the lidar and radar observations, i.e., most clouds with cloud liquid water content greater than 10 −7 kg kg −1 are detectable by the lidar, whereas the radar is primarily sensitive to the "rain"-category hydrometeors in the models and can detect even low amounts of rain.The observations reveal two prominent modes of cumulus cloud top heights separating the clouds into two layers. The lower mode relates to boundary layer convection with tops closely above the lifted condensation level, which is at about 700 m above 10 sea level. The upper mode is driven by shallow moist convection, also contains shallow outflow anvils, and is closely related to the trade inversion at about 2.3 km above sea level. The two cumulus modes are reflected differently by the lidar and the radar observations and under different liquid water path (LWP) conditions. The storm-resolving model (SRM) at kilometer scale reproduces the cloud modes barely and shows the most cloud tops slightly above the observed lower mode. The large-eddy model (LEM) at hectometer scale reproduces better the observed cloudiness distribution with a clear bimodal separation. We 15 hypothesize that slight differences in the autoconversion parametrizations could have caused the different cloud development in the models. Neither model seems to account for in-cloud drizzle particles that do not precipitate down to the surface but generate a stronger radar signal even in scenes with low LWP. Our findings suggest that even if the SRM is a step forward for better cloud representation in climate research, the LEM can better reproduce the observed shallow cumulus convection and should therefore in principle represent cloud radiative effects and water cycle better. The representation of low-level oceanic clouds contributes largely to differences between climate models in terms of equilibrium climate sensitivity (Schneider et al., 2017). Global atmospheric models with kilometer-scale resolution are considered as the way forward in forecasting future climate scenarios (Bony and Dufresne, 2005; Satoh et al., 2019). The increased model 25 resolution and better matching scales with measurements allow for a more direct observational assessment by comparing the present day representation in the models with atmospheric measurements and thus anchoring models to reality. Recently, Stevens et al. (2020) demonstrated the general advantage...