2019
DOI: 10.3390/s19163579
|View full text |Cite
|
Sign up to set email alerts
|

An Efficient Three-Dimensional Convolutional Neural Network for Inferring Physical Interaction Force from Video

Abstract: Interaction forces are traditionally predicted by a contact type haptic sensor. In this paper, we propose a novel and practical method for inferring the interaction forces between two objects based only on video data—one of the non-contact type camera sensors—without the use of common haptic sensors. In detail, we could predict the interaction force by observing the texture changes of the target object by an external force. For this purpose, our hypothesis is that a three-dimensional (3D) convolutional neural … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 23 publications
(18 citation statements)
references
References 25 publications
0
16
0
Order By: Relevance
“…such as MobileNet [11] and ShuffleNet [36]. We believe that using this latest computation-efficient network architectures will accelerate the inference time of the proposed method and [14] is a good example.…”
Section: Resultsmentioning
confidence: 99%
“…such as MobileNet [11] and ShuffleNet [36]. We believe that using this latest computation-efficient network architectures will accelerate the inference time of the proposed method and [14] is a good example.…”
Section: Resultsmentioning
confidence: 99%
“…For example, 3D-CNNs are often used to recognize gestures or emotion in videos [ 35 , 36 , 49 ]. However, compared with approaches that combine CNN with RNN structures such as long short-term memory (LSTM) or gated recurrent units (GRU), 3D-CNN has a disadvantage that derives from its high computational complexity and excessive memory consumption, which can be a major burden for several applications that require high inference rates, especially on embedded devices [ 50 ]. Additionally, RNN architectures can be used to extract long-term temporal characteristics, whereas 3D-CNNs are mostly used for the extraction of short-term temporal pattern [ 51 ].…”
Section: Methodology and Background Knowledgementioning
confidence: 99%
“…For example, 3D-CNNs are often used to recognize gestures or emotion in videos [33,34,44]. However, compared with approaches that combine CNN with RNN structures such as long-short term memory (LSTM) or gated recurrent units (GRU), 3D-CNN has a disadvantage that derives from its high computational complexity and excessive memory consumption, which can be a major burden for several applications that require high inference rates, especially on embedded devices [45]. Additionally, RNN architecture can be used to extract long-term temporal characteristics, whereas 3D-CNN are mostly used for the extraction of short-term temporal pattern [46].…”
Section: Recurrent Neural Network and Gated Recurrent Units (Gru)mentioning
confidence: 99%