Marine underwater imaging facilitates non-destructive sampling of species at frequencies, durations, and accuracies that are unattainable by conventional sampling methods. These systems necessitate complex automated processes to identify organisms efficiently, however, current frameworks struggle to disentangle ecological foreground components from their dispensable background content. Underwater image processing relies on common architecture: namely image binarization for segmenting potential targets, prior to information extraction and classification by deep learning models. While intuitive, this infrastructure underperforms as it has difficulty in handling: high concentrations of biotic and abiotic particles, rapid changes in dominant taxa, and target sizes that vary by several orders of magnitude. To overcome these issues, a new framework is presented that begins with a scene classifier to capture large within-image variation, such as disparities in particle concentration and dominant taxa. Following scene classification, scene-specific regional convolutional neural network (Mask R-CNN) models were trained to separate target objects into different taxonomic groups. The procedure allows information to be extracted from different image types, while minimizing potential bias for commonly occurring features. Usingin situcoastal PlanktonScope images, we compared the scene-specific models to the Mask R-CNN model including all scene categories without scene classification, defined as the full model, and found that the scene-specific approach outperformed the full model with >20% accuracy in noisy images. The full model missed up to 78% of the dominant taxonomic groups, such asLyngbya, Noctiluca, andPhaeocystiscolonies. This performance improvement is due to the scene classifier, which reduces the variation among images and allows an improved match between the observed taxonomic groups and the taxonomic groups in pre-trained models. We further tested the framework on images from a benthic video camera and an imaging sonar system. Results demonstrate that the procedure is applicable to different types of underwater images and achieves significantly more accurate results than the full model. Given that the unified framework is neither instrument nor ecosystem-specific, the proposed model facilitates deployment throughout the marine biome.