Modern research efforts concerned with animal behavior rely heavily on image and video analysis. While such data are now quick to obtain, extracting and analyzing complex behaviors under naturalistic conditions is still a major challenge, specifically when the behavior of interest is sporadic and rare. In this study, we present an end-to-end system for capturing, detecting and analyzing larval fish feeding behavior in unconstrained naturalistic environments. We first constructed a specialized system for imaging these tiny, fast-moving creatures and deployed it in large aquaculture rearing pools. We then designed an analysis pipeline using several action classification backbones, and compare their performance. A natural feature of the data was the extremely low prevalence of feeding events, leading to low sample sizes and highly imbalanced datasets despite extensive annotation efforts. Nevertheless, our pipeline successfully detected and classified the sparsely-occurring feeding behavior of fish larvae in a curated experimental setting from videos featuring multiple animals. We introduce three new annotated datasets of underwater videography, in a curated and an uncurated setting. As challenges related to data imbalance and expert's annotation are common to the analysis of animal behavior under naturalistic conditions, we believe our findings can contribute to the growing field of computer vision for the study and understanding of animal behavior.