2022 IEEE 7th International Conference for Convergence in Technology (I2CT) 2022
DOI: 10.1109/i2ct54291.2022.9824488
|View full text |Cite
|
Sign up to set email alerts
|

Focal-WNet: An Architecture Unifying Convolution and Attention for Depth Estimation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 22 publications
0
4
0
Order By: Relevance
“…In terms of error performance, our TP-GAN obtains the lowest RMS error by a margin of 0.009 when compared with a very close competitive method of [10]. While our quantitative results are not as good as to those of [51] on the KITTI data, ours outperforms a vast majority of previous methods [7], [11], [12], [16], [45], [46], [47], [48], [49], [50]. Among the approaches, our TP-GAN achieves the best on SQREL and RMSE LOG scores.…”
Section: A Non-adversarial Modelsmentioning
confidence: 64%
See 2 more Smart Citations
“…In terms of error performance, our TP-GAN obtains the lowest RMS error by a margin of 0.009 when compared with a very close competitive method of [10]. While our quantitative results are not as good as to those of [51] on the KITTI data, ours outperforms a vast majority of previous methods [7], [11], [12], [16], [45], [46], [47], [48], [49], [50]. Among the approaches, our TP-GAN achieves the best on SQREL and RMSE LOG scores.…”
Section: A Non-adversarial Modelsmentioning
confidence: 64%
“…In terms of accuracy, as demonstrated in Tab. 4, our technique surpasses all nine previous adversarial works [19], [20], [21], [22], [23], [24], [25], [26], [27] as well as non-adversarial methods [7], [11], [12], [16], [45], [46], [47], [48], [49], [50] by significant margins for all the three thresholds δ < 1.25, δ < 1.25 2 , and δ < 1.25 3 , but performs lower than the work in [51] with a small margin. In Tab.…”
Section: ) Kitti Datasetmentioning
confidence: 80%
See 1 more Smart Citation
“…Then, the features from CNN and Transformer are fed into a cross-attention module to model the dependency between the global and local information. Being similar to that of Depth-Former, Manimaran et al also exploited two separate encoders to extract global and local features [77]. The Transformer-based encoder uses images of 224 × 224 as inputs, while the encoder based on DenseNet [78] uses images of 512 × 512 as inputs.…”
Section: Supervised Trainingmentioning
confidence: 99%