The accurate detection and quantification of submerged targets has been recognized as a key challenge in marine exploration, one that traditional census approaches cannot handle efficiently. Here we present a deep learning approach to detect the pattern of a moving fish from the reflections of an active acoustic emitter. To allow for real-time detection, we use a convolutional neural network, which provides the simultaneous labeling of a large buffer of signal samples. This allows to capture the structure of the reflecting signal from the moving target and to separate it from clutter reflections. We evaluate system performance both on synthetic (simulated) data, as well as on real data recorded over 50 sea experiments in a variety of sea conditions. When tested on real signals, the network trained on simulated patterns showed non-trivial detection capabilities, suggesting that transfer learning can be a viable approach in these scenarios, where tagged data is often lacking. However, training the network directly on the real reflections with data augmentation techniques allowed to reach a more favorable precision-recall trade-off, approaching an ideal detection bound. We also evaluate an alternative model based on recurrent neural networks which, despite exhibiting slightly inferior performance, could be applied in scenarios requiring on-line processing of the reflection sequence.