The resolution of the computed depth maps of fish in an underwater environment limits the 3D fish metric estimation. This paper addresses this problem using object-based matching for underwater fish tracking and depth computing using convolutional neural networks (CNNs). First, for each frame in a stereo video, a joint object classification and semantic segmentation CNN is used to segment fish objects from the background. Next, the fish objects in these images are cropped and matched for the subpixel disparity computation using the video interpolation CNN. The calculated disparities and depth values estimate the fish metrics, including the length, height, and weight. Next, we tracked the fish across frames of the input stereo video to compute the metrics of the fish frame by frame. Finally, the fish median metrics are calculated for noise reduction caused by the fish motions. Hereafter, the fish with incorrect measurement by the stereo camera is cleaned before generating the final fish metric distributions, which are relevant inputs for learning decision models to manage a fish farm. We also constructed underwater stereo video datasets with actual fish metrics measured by humans to verify the effectiveness of our approach. Experimental results show a 5% error rate in our fish length estimation.INDEX TERMS convolutional neural network, object tracking, object-based stereo matching I. INTRODUCTION