The use of Deep learning techniques in the field of Marine Science has become popular in recent years. For instance, many works propose the application of instance segmentation neural networks (in particular, Mask R-CNN) for detection and classification of fish in underwater images. The performance of these learning-based approaches depends heavily on the volume of data used for training, which, in the case of instance segmentation models for fish detection, implies that human experts must label and mark the shapes of all the fish appearing in a vast amount of underwater images. This is an enormously timeconsuming task that we seek to alleviate in this paper. We propose a training strategy that combines manual and semi-automatic annotations. The latter are obtained in a weakly-supervised manner: the bounding box that contains the fish is manually selected, but its shape is automatically obtained thanks to a pretrained encoder-decoder segmentation network. Several popular architectures for this encoder-decoder network are examined. This strategy permits to reduce drastically the annotation cost for instance segmentation, at the expense of a small drop in performance with respect to the use of fully manual annotations. We show that a balance can be achieved between the segmentation performance and the time used to collect the training data by using the proposed strategy.