Long‐term monitoring is an important component of effective wildlife conservation. However, many methods for estimating density are too costly or difficult to implement over large spatial and temporal extents. Recently developed spatial mark–resight (SMR) models are increasingly being applied as a cost‐effective method to estimate density when data include detections of both marked and unmarked individuals. We developed a generalized SMR model that can accommodate long‐term camera data and auxiliary telemetry data for improved spatiotemporal inference in monitoring efforts. The model can be applied in two stages, with detection parameters estimated in the first stage using telemetry data and camera detections of instrumented individuals. Density is estimated in the second stage using camera data, with all individuals treated as unmarked. Serial correlation in detection and density parameters is accounted for using time‐series models. The two‐stage approach reduces computational demands and facilitates the application to large data sets from long‐term monitoring initiatives. We applied the model to 3 years (2015–2017) of white‐tailed deer (Odocoileus virginianus) data collected in three study areas of the Big Cypress Basin, Florida, USA. In total, 59 females marked with ear tags and fitted with GPS‐telemetry collars were detected along with unmarked females on 180 remote cameras. Most of the temporal variation in density was driven by seasonal fluctuations, but one study area exhibited a slight population decline during the monitoring period. Modern technologies such as camera traps provide novel possibilities for long‐term monitoring, but the resulting massive data sets, which are subject to unique sources of observation error, have posed analytical challenges. The two‐stage spatial mark–resight framework provides a solution with lower computational demands than joint SMR models, allowing for easier implementation in practice. In addition, after detection parameters have been estimated, the model may be used to estimate density even if no synchronous auxiliary information on marked individuals is available, which is often the case in long‐term monitoring.