Continuous monitoring using fixed-point cameras is effective for the early detection and understanding of scum behavior in urban tidal rivers. Scum-detection techniques using U-Net have been developed in previous studies. However, a lot of effort and time required to create the label images necessary for training makes it difficult to apply the method to multiple locations. In this study, we developed a new learning method using dummy images and evaluated its effectiveness by comparing it with conventional methods based on the following evaluation indicators: precision, recall, F-value, and mIoU (mean value of intersection over union). Our results showed success using our method in detecting scum with higher accuracy than conventional methods, while substantially reducing the effort required to create labels, which is a bottleneck in conventional training models. Our method makes it possible to understand a wide range of spatiotemporal behavior of scum. Additionally, by applying this method to suspended solids other than scum, it can be used as a general purpose technique for the continuous monitoring of river debris.