2023
DOI: 10.1007/978-3-031-25082-8_1
|View full text |Cite
|
Sign up to set email alerts
|

EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
45
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 125 publications
(45 citation statements)
references
References 29 publications
0
45
0
Order By: Relevance
“…Surface normal calculation: project the 2D point pi,j onto the 3D surface, corresponding to the point Pi,j. The eight-neighbor point constraint method 19 is used to determine the plane around the pixel point pi,j. Then, the surface normals ng of the four corresponding planes are calculated in the 3D coordinate system, g=1, 2, 3, 4.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Surface normal calculation: project the 2D point pi,j onto the 3D surface, corresponding to the point Pi,j. The eight-neighbor point constraint method 19 is used to determine the plane around the pixel point pi,j. Then, the surface normals ng of the four corresponding planes are calculated in the 3D coordinate system, g=1, 2, 3, 4.…”
Section: Methodsmentioning
confidence: 99%
“…Figure 4 shows the location of DGCs in the network. The calculation process of absolute depth estimation is divided into four steps: 19 (1) Surface normal calculation: project the 2D point p i;j onto the 3D surface, corresponding to the point P i;j . The eight-neighbor point constraint method 19 is used to determine the plane around the pixel point p i;j .…”
Section: Dense Geometry Constraint Modulementioning
confidence: 99%
See 1 more Smart Citation
“…Given the rapid advancement in AI, we acknowledge that the pipeline we select may not sustain peak performance. For example, multi-modal LLMs are equipped with vision capabilities [43,51,78], and dense video captioning models may improve rapidly by benefiting from large-scale pre-trained models [85]. Despite technological advances, our work provides enduring insights that transcend the specific models.…”
Section: Selecting Models and Constructing Pipelinesmentioning
confidence: 99%
“…Moreover, lightweight inference is hindered by the limited ability to compute global interactions in the spatial dimension due to input size constraints. Similarly, EdgeNeXt [21] combines depth-wise separable convolution and transposed attention mechanisms to introduce a split depth-wise transpose attention that enhances resource utilization. However, the structural design of this model is dependent on intricate submodules, such as Res2Net [22], ConvNeXt [23], and XCA [24].…”
Section: Introductionmentioning
confidence: 99%