Background subtraction is a fundamental task in computer vision with numerous real-world applications, ranging from object tracking to video surveillance. However, dynamic backgrounds can pose a significant challenge in this problem. While various methods have been proposed for background subtraction, supervised deep learning-based techniques are currently considered state-of-the-art. However, these methods require pixelwise ground-truth labeling, which can be time-consuming and expensive. In this work, we propose a weakly supervised framework that can perform background subtraction without requiring perpixel ground-truth labels. Our framework is trained on a moving object-free sequence of images and comprises two networks. The first network is an autoencoder that generates static background images and prepares dynamic background images for training the second network. The dynamic background images are obtained by thresholding the background-subtracted images. The second network is a U-Net that uses the same moving object-free video for training and the dynamic background images as pixel-wise ground-truth labels. During the test phase, the input images are processed by the autoencoder and U-Net, which generate static and dynamic background images, respectively. The dynamic background image helps remove dynamic motion from the background subtracted image, enabling us to obtain a foreground image that is free of dynamic artifacts. To demonstrate the effectiveness of our method, we conducted experiments on selected categories of the CDnet 2014 dataset and the I2R dataset. Our method outperformed all top-ranked unsupervised methods. It also surpassed one of the two existing weakly supervised methods, while achieving comparable results to the other method but with a shorter running time. Our proposed method is online, realtime, efficient, and requires minimal frame-level annotation, making it suitable for a wide range of real-world applications.