Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
In the past few years, the popularity and wide use of video conferencing software enjoyed exponential growth in market size. This technology enables participants in different geographic regions to have a virtual face-to-face meeting. Additionally, it allows participants to utilize virtual backgrounds to hide their real environment with privacy concerns or to reduce distractions, particularly in professional settings. In scenarios where the users should not hide their actual locations, they may mislead other participants into assuming that the displayed virtual backgrounds are real. In this paper, we propose a new publicly-available dataset of virtual and real backgrounds in video conferencing software (e.g., Zoom, Google Meet, Microsoft Teams). The presented archive was evaluated by an exhaustive series of tests and scenarios using two well-known features extraction methods: CRSPAM1372 and six co-mat. The first verification scenario considers the case where the detector is unaware of manipulated frames (i.e., the forensically-edited frames are not part of the training set). A model trained on zoom frames that were tested with Google Meet frames can detect real background images from virtual ones in video conferencing software with 99.80% detection accuracy. Furthermore, it is possible to distinguish virtual from real backgrounds in videos created for videoconferencing software at a high detection rate of approximately 99.80%. According to our conclusions, the proposed method greatly enhanced the detection accuracy and resistance against diverse adversarial conditions, making it a reliable technique for classifying actual as opposed to virtual backgrounds in video communications. Given the described dataset provided and some preliminary experiments that we performed, we expect that it will lead to more future research in this domain.
In the past few years, the popularity and wide use of video conferencing software enjoyed exponential growth in market size. This technology enables participants in different geographic regions to have a virtual face-to-face meeting. Additionally, it allows participants to utilize virtual backgrounds to hide their real environment with privacy concerns or to reduce distractions, particularly in professional settings. In scenarios where the users should not hide their actual locations, they may mislead other participants into assuming that the displayed virtual backgrounds are real. In this paper, we propose a new publicly-available dataset of virtual and real backgrounds in video conferencing software (e.g., Zoom, Google Meet, Microsoft Teams). The presented archive was evaluated by an exhaustive series of tests and scenarios using two well-known features extraction methods: CRSPAM1372 and six co-mat. The first verification scenario considers the case where the detector is unaware of manipulated frames (i.e., the forensically-edited frames are not part of the training set). A model trained on zoom frames that were tested with Google Meet frames can detect real background images from virtual ones in video conferencing software with 99.80% detection accuracy. Furthermore, it is possible to distinguish virtual from real backgrounds in videos created for videoconferencing software at a high detection rate of approximately 99.80%. According to our conclusions, the proposed method greatly enhanced the detection accuracy and resistance against diverse adversarial conditions, making it a reliable technique for classifying actual as opposed to virtual backgrounds in video communications. Given the described dataset provided and some preliminary experiments that we performed, we expect that it will lead to more future research in this domain.
The revolutionary breakthroughs in Machine Learning (ML) and Artificial Intelligence (AI) are extensively being harnessed across a diverse range of domains, e.g., forensic science, healthcare, virtual assistants, cybersecurity, and robotics. On the flip side, they can also be exploited for negative purposes, like producing authentic-looking fake news that propagates misinformation and diminishes public trust. Deepfakes pertain to audio or visual multimedia contents that have been artificially synthesized or digitally modified through the application of deep neural networks. Deepfakes can be employed for benign purposes (e.g., refinement of face pictures for optimal magazine cover quality) or malicious intentions (e.g., superimposing faces onto explicit image/video to harm individuals producing fake audio recordings of public figures making inflammatory statements to damage their reputation). With mobile devices and user-friendly audio and visual editing tools at hand, even non-experts can effortlessly craft intricate deepfakes and digitally altered audio and facial features. This presents challenges to contemporary computer forensic tools and human examiners, including common individuals and digital forensic investigators. There is a perpetual battle between attackers armed with deepfake generators and defenders utilizing deepfake detectors. This paper first comprehensively reviews existing image, video, and audio deepfake databases with the aim of propelling next-generation deepfake detectors for enhanced accuracy, generalization, robustness, and explainability. Then, the paper delves deeply into open challenges and potential avenues for research in the audio and video deepfake generation and mitigation field. The aspiration for this article is to complement prior studies and assist newcomers, researchers, engineers, and practitioners in gaining a deeper understanding and in the development of innovative deepfake technologies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.