When a bug manifests in a user-facing application, it is likely to be exposed through the graphical user interface (GUI). Given the importance of visual information to the process of identifying and understanding such bugs, users are increasingly making use of screenshots and screen-recordings as a means to report issues to developers. However, when such information is reported en masse, such as during crowd-sourced testing, managing these artifacts can be a time-consuming process. As the reporting of screen-recordings in particular becomes more popular, developers are likely to face challenges related to manually identifying videos that depict duplicate bugs. Due to their graphical nature, screen-recordings present challenges for automated analysis that preclude the use of current duplicate bug report detection techniques. To overcome these challenges and aid developers in this task, this paper presents TA N G O , a duplicate detection technique that operates purely on video-based bug reports by leveraging both visual and textual information. TANGO combines tailored computer vision techniques, optical character recognition, and text retrieval. We evaluated multiple configurations of Ta n g o in a comprehensive empirical evaluation on 4,860 duplicate detection tasks that involved a total of 180 screenrecordings from six Android apps. Additionally, we conducted a user study investigating the effort required for developers to manually detect duplicate video-based bug reports and compared this to the effort required to use TA N G O . The results reveal that TA N G O 's optimal configuration is highly effective at detecting duplicate video-based bug reports, accurately ranking target duplicate videos in the top-2 returned results in 83% of the tasks. Additionally, our user study shows that, on average, TANGO can reduce developer effort by over 60%, illustrating its practicality.