Bucket fill factor is of paramount importance in measuring the productivity of construction vehicles, which is the percentage of materials loaded in the bucket within one scooping. Additionally, the locational information of the bucket is also indispensable for scooping trajectory planning. Some research has been conducted to measure it via state-of-the-art computer vision approaches, but their robustness against various environment conditions is not considered. The aim of this study is to fill this gap and six distinctive environment settings are included. Images captured by a stereo camera are used to generate point clouds before being structured into 3D maps. This novel preprocessing pipeline for deep learning is originally proposed and its feasibility has been validated through this study. Moreover, multitask learning is employed to exploit the positive relationship among two tasks: fill factor prediction and bucket detection. Therefore, after preprocessing, 3D maps are forwarded to a faster region with convolutional neural network incorporated with an improved residual neural network. The value of fill factor is acquired via a classification and probabilistic-based approach, which is novel, achieving an inspiring result (overall volume estimation accuracy: 95.23% and detection precision: 92.62%) at the same time.