Real-Time Multimodal 3D Object Detection with Transformers

Liu, Hengsong; Duan, Tongle

doi:10.3390/wevj15070307

WEVJ

2024

DOI: 10.3390/wevj15070307

|View full text |Cite

Real-Time Multimodal 3D Object Detection with Transformers

Hengsong Liu,

Tongle Duan

Abstract: The accuracy and real-time performance of 3D object detection are key factors limiting its widespread application. While cameras capture detailed color and texture features, they lack depth information compared to LiDAR. Multimodal detection combining both can improve results but incurs significant computational overhead, affecting real-time performance. To address these challenges, this paper presents a real-time multimodal fusion model called Fast Transfusion that combines the benefits of LiDAR and camera se… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article2

Relationship

Self Cite0

Independent2

Authors

Journals

Cited by 2 publications

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Disassembly of Distribution Transformers Based on Multimodal Data Recognition and Collaborative Processing

Wang,

Chen,

et al. 2024

Algorithms

View full text Add to dashboard Cite

As power system equipment gradually ages, the automated disassembly of transformers has become a critical area of research to enhance both efficiency and safety. This paper presents a transformer disassembly system designed for power systems, leveraging multimodal perception and collaborative processing. By integrating 2D images and 3D point cloud data captured by RGB-D cameras, the system enables the precise recognition and efficient disassembly of transformer covers and internal components through multimodal data fusion, deep learning models, and control technologies. The system employs an enhanced YOLOv8 model for positioning and identifying screw-fastened covers while also utilizing the STDC network for segmentation and cutting path planning of welded covers. In addition, the system captures 3D point cloud data of the transformer’s interior using multi-view RGB-D cameras and performs multimodal semantic segmentation and object detection via the ODIN model, facilitating the high-precision identification and cutting of complex components such as windings, studs, and silicon steel sheets. Experimental results show that the system achieves a recognition accuracy of 99% for both cover and internal component disassembly, with a disassembly success rate of 98%, demonstrating its high adaptability and safety in complex industrial environments.

show abstract

Disassembly of Distribution Transformers Based on Multimodal Data Recognition and Collaborative Processing

Wang,

Chen,

et al. 2024

Algorithms

View full text Add to dashboard Cite

show abstract

Object Detection and Information Perception by Fusing YOLO-SCG and Point Cloud Clustering

Liu,

Zhao,

Zhou

et al. 2024

Sensors

View full text Add to dashboard Cite

Robots need to sense information about the external environment before moving, which helps them to recognize and understand their surroundings so that they can plan safe and effective paths and avoid obstacles. Conventional algorithms using a single sensor cannot obtain enough information and lack real-time capabilities. To solve these problems, we propose an information perception algorithm with vision as the core and the fusion of LiDAR. Regarding vision, we propose the YOLO-SCG model, which is able to detect objects faster and more accurately. When processing point clouds, we integrate the detection results of vision for local clustering, improving both the processing speed of the point cloud and the detection effectiveness. Experiments verify that our proposed YOLO-SCG algorithm improves accuracy by 4.06% and detection speed by 7.81% compared to YOLOv9, and our algorithm excels in distinguishing different objects in the clustering of point clouds.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Real-Time Multimodal 3D Object Detection with Transformers

Cited by 2 publications

References 38 publications

Disassembly of Distribution Transformers Based on Multimodal Data Recognition and Collaborative Processing

Disassembly of Distribution Transformers Based on Multimodal Data Recognition and Collaborative Processing

Object Detection and Information Perception by Fusing YOLO-SCG and Point Cloud Clustering

Contact Info

Product

Resources

About