2020
DOI: 10.1016/j.cviu.2019.102844
|View full text |Cite
|
Sign up to set email alerts
|

Deep sensorimotor learning for RGB-D object recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 53 publications
0
4
0
Order By: Relevance
“…Recently, numerous works show that if a network performs satisfactory results, image-based classification also will have high performance on the COVID-19 task. Numerous network structures [58] , [59] , [60] , [61] , [62] , [63] , [64] , [65] have been proposed for visual object recognition, and achieved gratifying performance. Both ResNets [58] and DensNets [59] architectures are successfully applied to in various image recognition tasks, and achieved outstanding performance.…”
Section: Methodsmentioning
confidence: 99%
“…Recently, numerous works show that if a network performs satisfactory results, image-based classification also will have high performance on the COVID-19 task. Numerous network structures [58] , [59] , [60] , [61] , [62] , [63] , [64] , [65] have been proposed for visual object recognition, and achieved gratifying performance. Both ResNets [58] and DensNets [59] architectures are successfully applied to in various image recognition tasks, and achieved outstanding performance.…”
Section: Methodsmentioning
confidence: 99%
“…interaction observation to improve object recognition [40]- [42], or used to learn semantics and boost object localization for improved scene understanding [43], [44]. In action understanding, object affordances have been utilized for action anticipation [45]- [49], hand grasp generation [50], [51], and used as context information to improve action recognition [52], [53].…”
Section: A Affordance As Auxiliary Informationmentioning
confidence: 99%
“…In more detail, we choose to combine RGB and depth information, by first mapping each RGB frame to the corresponding depthmap resolution (employing the RGB-D alignment process described in [42]) and subsequently appending the resulting color frame to the depthmap along the channel dimension. This yields a 4 ×H×W input, where H = W = 300 represent the height and width of the input image and depthmap after center-cropping (see also Section IV).…”
Section: A Input Representationsmentioning
confidence: 99%
“…We develop three model variants to efficiently encode the spatio-temporal nature of the hand-object interaction, and investigate an attention mechanism that relies on the appearance stream confidence. The content of this chapter is based on the spatial and spatio-temporal models presented in [100] and [102], as well as on [101] where the attention mechanism is proposed.…”
Section: Contributions and Outlinementioning
confidence: 99%