Robotic perception systems often include approaches that can extract valuable features or information from the studied dataset. These methods often involve the application of deep learning approaches, such as convolutional neural networks (CNNs), for processing of images, as well as the incorporation of 3D data. The notion of image categorization is well delineated via the use of networks that include convolutional networks. However, some network topologies exhibit a substantial scope and need significant amounts of time and memory resources. On the other hand, the neural networks FlowNet3D and PointFlowNet have the capability to accurately predict scene flow. Specifically, these networks are capable of estimating the three-dimensional movements of point clouds (PCs) within a dynamic environment. When using PCs in robotic applications, it is crucial to examine the robustness of accurately recognizing the points that belong to the object. This article examines the use of robotic perception systems inside autonomous vehicles and the inherent difficulties linked to the analysis and processing of information obtained from diverse sensors. The researchers put out a late fusion methodology that integrates the results of many classifiers in order to enhance the accuracy of categorization. Additionally, the authors propose a weighted fusion technique that incorporates the proximity to objects as a significant factor. The findings indicate that the fusion methods described in this study exhibit superior performance compared to both single modality classification and classic fusion strategies.