Environmental monitoring guides conservation and is particularly important for aquatic habitats which are heavily impacted by human activities. Underwater cameras and uncrewed devices monitor aquatic wildlife, but manual processing of footage is a significant bottleneck to rapid data processing and dissemination of results. Deep learning has emerged as a solution, but its ability to accurately detect animals across habitat types and locations is largely untested for coastal environments. Here, we produce five deep learning models using an object detection framework to detect an ecologically important fish, luderick (Girella tricuspidata). We trained two models on footage from single habitats (seagrass or reef) and three on footage from both habitats. All models were subjected to tests from both habitat types. Models performed well on test data from the same habitat type (object detection measure: mAP50: 91.7 and 86.9% performance for seagrass and reef, respectively) but poorly on test sets from a different habitat type (73.3 and 58.4%, respectively). The model trained on a combination of both habitats produced the highest object detection results for both tests (an average of 92.4 and 87.8%, respectively). The ability of the combination trained models to correctly estimate the ecological abundance metric, MaxN, showed similar patterns. The findings demonstrate that deep learning models extract ecologically useful information from video footage accurately and consistently and can perform across habitat types when trained on footage from the variety of habitat types.