More frequent and thorough inspection of sewer pipes has the potential to save billions in utilities. However, the amount and quality of inspection are impeded by an imprecise and highly subjective manual process. It involves technicians judging stretches of sewer based on video from remote-controlled robots. Determining the state of sewer pipes based on these videos entails a great deal of ambiguity. Furthermore, the frequency with which the different defects occur differs a lot, leading to highly imbalanced datasets. Such datasets represent a poor basis for automating the labeling process using supervised learning. With this paper we explore the potential of self-supervision as a method for reducing the need for large numbers of well-balanced labels. First, our models learn to represent the data distribution using more than a million unlabeled images, then a small number of labeled examples are used to learn a mapping from the learned representations to a relevant target variable, in this case, water level. We choose a convolutional Autoencoder, a Variational Autoencoder and a Vector-Quantised Variational Autoencoder as the basis for our experiments. The best representations are shown to be learned by the classic Autoencoder with the Multi-Layer Perceptron achieving a Mean Absolute Error of 9.93. This is an improvement of 9.62 over the fully supervised baseline.