Although modern machine learning has the potential to greatly speed up the interpretation of imagery, the varied nature of the seabed and limited availability of expert annotations form barriers to its widespread use in seafloor mapping applications. This motivates research into unsupervised methods that function without large databases of human annotations. This paper develops an unsupervised feature learning method for georeferenced seafloor visual imagery that considers patterns both within the footprint of a single image frame and broader scale spatial characteristics. Features within images are learnt using an autoencoder developed based on the AlexNet deep convolutional neural network. Features larger than each image frame are learnt using a novel loss function that regularises autoencoder training using the Kullback–Leibler divergence function to loosely assume that images captured within a close distance of each other look more similar than those that are far away. The method is used to semantically interpret images taken by an autonomous underwater vehicle at the Southern Hydrates Ridge, an active gas hydrate field and site of a seafloor cabled observatory at a depth of 780 m. The method's performance when applied to clustering and content‐based image retrieval is assessed against a ground truth consisting of more than 18,000 human annotations. The study shows that the location based loss function increases the rate of information retrieval by a factor of two for seafloor mapping applications. The effects of physics‐based colour correction and image rescaling are also investigated, showing that the improved consistency of spatial information achieved by rescaling is beneficial for recognising artificial objects such as cables and infrastructures, but is less effective for natural objects that have greater dimensional variability.
We describe a novel semi-supervised learning method that reduces the labelling effort needed to train convolutional neural networks (CNNs) when processing georeferenced imagery. This allows deep learning CNNs to be trained on a per-dataset basis, which is useful in domains where there is limited learning transferability across datasets. The method identifies representative subsets of images from an unlabelled dataset based on the latent representation of a location guided autoencoder. We assess the method's sensitivities to design options using four different ground-truthed datasets of georeferenced environmental monitoring images, where these include various scenes in aerial and seafloor imagery. Efficiency gains are achieved for all the aerial and seafloor image datasets analysed in our experiments, demonstrating the benefit of the method across application domains. Compared to CNNs of the same architecture trained using conventional transfer and active learning, the method achieves equivalent accuracy with an order of magnitude fewer annotations, and 85 % of the accuracy of CNNs trained conventionally with approximately 10,000 human annotations using just 40 prioritised annotations. The biggest gains in efficiency are seen in datasets with unbalanced class distributions and rare classes that have a relatively small number of observations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.