Marine aquaculture plays an important role in seafood supplement, economic development, and coastal ecosystem service provision. The precise delineation of marine aquaculture areas from high spatial resolution (HSR) imagery is vital for the sustainable development and management of coastal marine resources. However, various sizes and detailed structures of marine objects make it difficult for accurate mapping from HSR images by using conventional methods. Therefore, this study attempts to extract marine aquaculture areas by using an automatic labeling method based on the convolutional neural network (CNN), i.e., an end-to-end hierarchical cascade network (HCNet). Specifically, for marine objects of various sizes, we propose to improve the classification performance by utilizing multi-scale contextual information. Technically, based on the output of a CNN encoder, we employ atrous convolutions to capture multi-scale contextual information and aggregate them in a hierarchical cascade way. Meanwhile, for marine objects with detailed structures, we propose to refine the detailed information gradually by using a series of long-span connections with fine resolution features from the shallow layers. In addition, to decrease the semantic gaps between features in different levels, we propose to refine the feature space (i.e., channel and spatial dimensions) using an attention-based module. Experimental results show that our proposed HCNet can effectively identify and distinguish different kinds of marine aquaculture, with 98% of overall accuracy. It also achieves better classification performance compared with object-based support vector machine and state-of-the-art CNN-based methods, such as FCN-32s, U-Net, and DeeplabV2. Our developed method lays a solid foundation for the intelligent monitoring and management of coastal marine resources.