The retrieval of chlorophyll-a (Chl-a) concentrations relies on empirical or analytical analyses, which generally experience difficulties from the diversity of inland waters in statistical analyses and the complexity of radiative transfer equations in analytical analyses, respectively. Previous studies proposed the utilization of artificial neural networks (ANNs) to alleviate these problems. However, ANNs do not consider the problem of insufficient in situ samples during model training, and they do not fully utilize the spatial and spectral information of remote sensing images in neural networks. In this study, a two-stage training is introduced to address the problem regarding sample insufficiency. The neural network is pretrained using the samples derived from an existing Chl-a concentration model in the first stage, and the pretrained model is refined with in situ samples in the second stage. A novel convolutional neural network for Chl-a concentration retrieval called WaterNet is proposed which utilizes both spectral and spatial information of remote sensing images. In addition, an end-to-end structure that integrates feature extraction, band expansion, and Chl-a estimation into the neural network leads to an efficient and effective Chl-a concentration retrieval. In experiments, Sentinel-3 images with the same acquisition days of in situ measurements over Laguna Lake in the Philippines were used to train and evaluate WaterNet. The quantitative analyses show that the two-stage training is more likely than the one-stage training to reach the global optimum in the optimization, and WaterNet with two-stage training outperforms, in terms of estimation accuracy, related ANN-based and band-combination-based Chl-a concentration models.