With the development of the Internet, people are surrounded by various types of information daily, including obscene videos. The quantity of such videos is increasing daily, making the detection and filtering of this information a crucial step in preventing its spread. However, a significant challenge remains in detecting obscene information in obscure scenarios, like indecent behavior occurring while wearing normal clothing, causing significant negative impacts, such as harmful influence on children. To address this issue, an innovative multi frame obscene video detection base on ViT is proposed by this manuscript per the authors, aiming to automatically detect and filter obscene content in videos. Extensive experiments conducted on the public NPDI dataset demonstrate that this method achieves better results than existing state-of-the-art methods, achieving 96.2%. Additionally, it achieves satisfactory classification accuracy on a dataset of obscure obscene videos.This provides a powerful tool for future video censorship and protects minors and the general public.