HeightNet: Monocular Object Height Estimation

Kim, In Su; Kim, Hyeongbok; Lee, Seungwon; Jung, Soon Ki

doi:10.3390/electronics12020350

Electronics

2023

DOI: 10.3390/electronics12020350

|View full text |Cite

HeightNet: Monocular Object Height Estimation

In Su Kim

Hyeongbok Kim²,

Seungwon Lee³

et al.

Abstract: Monocular depth estimation is a traditional computer vision task that predicts the distance of each pixel relative to the camera from one 2D image. Relative height information about objects lying on a ground plane can be calculated through several processing steps from the depth image. In this paper, we propose a height estimation method for directly predicting the height of objects from a 2D image. The proposed method utilizes an encoder-decoder network for pixel-wise dense prediction based on height consiste… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article2

Relationship

Self Cite0

Independent2

Authors

Journals

Cited by 2 publications

(1 citation statement)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In recent years, numerous researchers have introduced various monocular image depth estimation models based on the encoder-decoder architecture, as shown in Figure 1 [14,15]. This architecture is divided into two parts: the encoder, which extracts depth features from images, and the decoder, which predicts depth information.…”

mentioning

confidence: 99%

Edge-Enhanced Dual-Stream Perception Network for Monocular Depth Estimation

Liu,

Wang

2024

Electronics

View full text Add to dashboard Cite

Estimating depth from a single RGB image has a wide range of applications, such as in robot navigation and autonomous driving. Currently, Convolutional Neural Networks based on encoder–decoder architecture are the most popular methods to estimate depth maps. However, convolutional operators have limitations in modeling large-scale dependence, often leading to inaccurate depth predictions at object edges. To address these issues, a new edge-enhanced dual-stream monocular depth estimation method is introduced in this paper. ResNet and Swin Transformer are combined to better extract global and local features, which benefits the estimation of the depth map. To better integrate the information from the two branches of the encoder and the shallow branch of the decoder, we designed a lightweight decoder based on the multi-head Cross-Attention Module. Furthermore, in order to improve the boundary clarity of objects in the depth map, a loss function with an additional penalty for depth estimation error on the edges of objects is presented. The results on three datasets, NYU Depth V2, KITTI, and SUN RGB-D, show that the method presented in this paper achieves better performance for monocular depth estimation. Additionally, it has good generalization capabilities for various scenarios and real-world images.

show abstract

mentioning

confidence: 99%

Edge-Enhanced Dual-Stream Perception Network for Monocular Depth Estimation

Liu,

Wang

2024

Electronics

View full text Add to dashboard Cite

show abstract

Research on Camera Rotation Strategies for Active Visual Perception in the Self-Driving Vehicles

Kong,

Shi,

Yan

et al. 2024

Actuators

View full text Add to dashboard Cite

Aiming at the problem of blind field of view caused by the change in the vehicle’s yaw angle when the self-driving vehicle is turning or changing lanes, this paper proposes a camera rotation strategy based on monocular active environment sensing, which realizes the detection of the blind field of view when the vehicle’s yaw angle changes in the self-driving vehicle. Based on the two-degrees-of-freedom dynamic model, the camera rotation angle control is achieved by controlling the front-wheel angle of the vehicle. A camera control module is designed using Simulink to control the camera in real-time, allowing it to rotate based on different driving scenes. The effect of obstacle detection by traditional vision sensors and active vision sensors is tested under different vehicle driving scenes. The results demonstrate that the obstacle detection effect of the camera rotation strategy based on monocular active environment perception, as designed in this paper, is better than the traditional monocular vision.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

HeightNet: Monocular Object Height Estimation

Cited by 2 publications

References 26 publications

Edge-Enhanced Dual-Stream Perception Network for Monocular Depth Estimation

Edge-Enhanced Dual-Stream Perception Network for Monocular Depth Estimation

Research on Camera Rotation Strategies for Active Visual Perception in the Self-Driving Vehicles

Contact Info

Product

Resources

About