“…To cope with the large variance in dog size (range: 1.64 kg−36 kg), the modified attention U-Net in this study was designed to have deeper feature extraction (i.e., multiscale features) than the original attention U-Net ( 22 ) architecture. The designed network extracts features at 7 levels, reducing the spatial resolution from (1024, 512) to ( 8 , 16 ) for height and width, respectively. The filter dimensions of the model ( F 1 , F 2 , F 3 , F 4 , F 5 , F 6 , F 7 ) were selected as 16, 32, 64, 128, 256, 512, and 1024, respectively.…”