The rapid increase in waste generation from electrical and electronic equipment (WEEE) has created the need for more advanced sensor-based systems to sort this complex type of waste. Therefore, this study proposes a method for object detection, instance segmentation, and mass estimation of plastics and contaminants using the fusion of RGB and depth (D) images. The methodology is based on the Faster and Mask R-CNN with an extra head for the mass estimation. In addition, a pre-processing method to enhance the depth image (ED) is proposed. To evaluate the data fusion and pre-processing method, two data sets of plastics and impurities were created containing images with and without overlapping samples. The first data set contains 174 RGB images and depth (D) maps of 3146 samples, excluding their mass value, while the second data set contains 42 RGB and D images of 766 pieces together with their mass. The first and second data sets were used to evaluate the performance of Mask and Faster R-CNN. Further, the second data set was used to evaluate the network's performance with the additional head for mass estimation.The proposed method achieved 0.75 𝑹 𝟐 , 1.39 RMSE, and 0.81 MAE with an IoU greater than 50% using the network Resnet50_FPN_RGBED. Hence, it can be concluded that the presented method can distinguish plastics from other materials with good accuracy. Furthermore, the mass of each detected particle can be estimated individually, which is of great relevance for the recycling sector. Knowing the mass distribution and the percentage of contaminants in a waste stream of mixed plastics can be valuable for adjusting the parameters of upstream and downstream sorting processes.