2019
DOI: 10.1115/1.4043222
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Fusion Object Detection System for Autonomous Vehicles

Abstract: In order for autonomous vehicles to safely navigate the road ways, accurate object detection must take place before safe path planning can occur. Currently, general purpose object detection convolutional neural network (CNN) models have the highest detection accuracies of any method. However, there is a gap in the proposed detection frameworks. Specifically, those that provide high detection accuracy necessary for deployment but do not perform inference in realtime, and those that perform inference in realtime… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 24 publications
(7 citation statements)
references
References 32 publications
0
7
0
Order By: Relevance
“…Multimodal deep learning models that can ingest pixel data along with other data types (fusion) have been successful in applications outside of medicine, such as autonomous driving and video classification. As an example, a multimodal fusion detection system for autonomous vehicles, that combines visual features from cameras along with data from Light Detection and Ranging (LiDAR) sensors, is able to achieve significantly higher accuracy (3.7% improvement) than a single-modal CNN detection model 21 . Similarly, a multimodal social media video classification pipeline leveraging both visual and textual features increased the classification accuracy to 88.0%, well above single modality neural networks such as Google’s InceptionV3 which reached an accuracy of 76.4% on the same task 22 .…”
Section: Introductionmentioning
confidence: 99%
“…Multimodal deep learning models that can ingest pixel data along with other data types (fusion) have been successful in applications outside of medicine, such as autonomous driving and video classification. As an example, a multimodal fusion detection system for autonomous vehicles, that combines visual features from cameras along with data from Light Detection and Ranging (LiDAR) sensors, is able to achieve significantly higher accuracy (3.7% improvement) than a single-modal CNN detection model 21 . Similarly, a multimodal social media video classification pipeline leveraging both visual and textual features increased the classification accuracy to 88.0%, well above single modality neural networks such as Google’s InceptionV3 which reached an accuracy of 76.4% on the same task 22 .…”
Section: Introductionmentioning
confidence: 99%
“…Representative examples of such methodologies are the Sliding Shapes [23], Vote3D [24], VoxelNet [25], MFDS [26], MEGVII [27], 3D FCN [28], Vote3Deep [24], SECOND [29], Patch Refinement [30], Fast Point R-CNN [31], Voxel-FPN [32], PV-RCNN [33], HotSpotNet [67,3DBN [34], Fusion of Fusion Net [35] and Point A 2 NET [36].…”
Section: Data Representation Approachesmentioning
confidence: 99%
“…Several methods, such as Frustum PointNet (F-PointNet) [46], Multi-View 3D Object Detection (MV3D) [51], PointFusion [43], Aggregate View Object Detection network (AVOD) [56], RegiOn Approximation Refinement Network (Roarnet) [44], ContFuse [58], Multimodal Fusion Detection System (MFDS) [26], Multi-task Multi-sensor Fusion (MMF) [59], SCANet [57], PointPainting [45], SIFRNet [48], Complex-Retina [60] and LaserNet++ [61] propose a combination of images and LIDAR data to improve object detection accuracy. It allows for performing object detection in difficult scenarios, such as classifying small objects (pedestrians, cyclists) or distant objects, which is one of the limitations found in LiDAR-based object detection models.…”
Section: A Fusion-based Solutionsmentioning
confidence: 99%
See 1 more Smart Citation
“…The remaining points in the point cloud were divided into clusters according to the euclidean distance. Thus, the length of the object, its distance from the center and its direction were determined [26]. Babak et al proposed a new multi-sensor fusion pipeline configuration for object detection and tracking.…”
Section: Introductionmentioning
confidence: 99%