To improve the clarity of video images and reduce subtle content omission, this paper transforms video images into data supported by the system for recognition through computer vision techniques, quantifies the gray pixel levels, and obtains a two-dimensional array image. The direction constraint is added to the constraints to construct the DOG video image pyramid and search the extreme value points of the target image to get the projection space of the unmixing matrix of the image. The median value of the window is selected as the filter output, the image edges are sharpened, and the image details are enhanced by combining with the inverse Fourier transform to complete the video image data processing. The results show that the average processing vector time is 4.74µs, the image data processing time is short, and the image picture quality is high definition.