Many videos uploaded to online video platforms contain adult content that violates these platforms' policies and should be removed immediately. To recognize obscene videos, we developed a model that can process video frames in real-time while also adapting to time budget or hardware processing capacity. Thus, a deep convolutional neural network with multiple outputs was used. A decision-maker module was then designed to decide which neural network outputs to process and which label to assign to each frame. Using the reinforcement learning method, the decision-maker module is trained based on the results of previous frames as well as the results of neural network outputs while keeping the time budget in mind. Experiments showed that sacrificing a small amount of accuracy can increase speed by up to 4.7 times over the base model. We conclude that using a content correlation between consecutive frames not only reduces processing time by eliminating unnecessary frame processing but also improves the accuracy of the frame classification. It was also discovered that while using more of the previous frames, increases processing speed, the error in classifying the frame increases when the scene is changed.