Humans can fluidly adapt their interest in complex environments in ways that machines cannot. Here, we lay the groundwork for a real-world system that passively monitors and merges neural correlates of visual interest across team members via Collaborative Brain Computer Interface (cBCI). When group interest is detected and co-registered in time and space, it can be used to model the task relevance of items in a dynamic, natural environment. Previous work in cBCIs focuses on static stimuli, stimulus-or response-locked analyses, and often within-subject and experiment model training. The contributions of this work are twofold. First, we test the utility of cBCI on a scenario that more closely resembles natural conditions, where subjects visually scanned a video for target items in a virtual environment. Second, we use an experiment-agnostic deep learning model to account for the real-world use case where no training set exists that exactly matches the end-users' task and circumstances. With our approach we show improved performance as the number of subjects in the cBCI ensemble grows, and the potential to reconstruct ground-truth target occurrence in an otherwise noisy and complex environment.