End-to-End ConvNet for Tactile Recognition Using Residual Orthogonal Tiling and Pyramid Convolution Ensemble

Cao, Lele; Sun, Fuchun; Liu, Xiaolong; Huang, Wenbing; Ramamohanarao, Kotagiri; Li, Hongbo

doi:10.1007/s12559-018-9568-7

Cited by 11 publications

(6 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A 5 × 5 pressure sensor array attached on a two-finger manipulator was used to acquire tactile frames (Zhang et al, 2018 ), each frame was then resized to a 1 × 25 vector and fed into the LSTM for feature extraction, afterwards, the extracted features at different sampling moments were assigned different weights via a self-attention module, finally, the weighted feature vectors were used for TOR. The stacks of tactile frames and tactile flow of which the computing scheme is similar to optical flow were used as dual input (Cao et al, 2018 ), and were extracted initial features by two residual orthogonal tiling convolutions (ROTConvs) branches, afterwards, the initial features were further refined by orthogonal tiling convolutions (OTConv), finally, the refined features were used to identify the object category through softmax classifier. A 28 × 50 pressure sensor array attached on a two-finger manipulator was used to acquire tactile data (Pastor et al, 2019 ), and then a 3D CNN was employed to acquire the time series features and accomplish the object recognition.…”

Section: Introductionmentioning

confidence: 99%

Gradient adaptive sampling and multiple temporal scale 3D CNNs for tactile object recognition

et al. 2023

View full text Add to dashboard Cite

Tactile object recognition (TOR) is very important for the accurate perception of robots. Most of the TOR methods usually adopt uniform sampling strategy to randomly select tactile frames from a sequence of frames, which will lead to a dilemma problem, i.e., acquiring the tactile frames with high sampling rate will get lots of redundant data, while the low sampling rate will miss important information. In addition, the existing methods usually adopt single time scale to construct TOR model, which will induce that the generalization capability is not enough for processing the tactile data generated under different grasping speeds. To address the first problem, a novel gradient adaptive sampling (GAS) strategy is proposed, which can adaptively determine the sampling interval according to the importance of tactile data, therefore, the key information can be acquired as much as possible when the number of tactile frames is limited. To handle the second problem, a multiple temporal scale 3D convolutional neural networks (MTS-3DCNNs) model is proposed, which downsamples the input tactile frames with multiple temporal scales (MTSs) and extracts the MTS deep features, and the fused features have better generalization capability for recognizing the object grasped with different speed. Furthermore, the existing lightweight network ResNet3D-18 is modified to obtain a MR3D-18 network which can match the tactile data with smaller size and prevent the overfitting problem. The ablation studies show the effectiveness of GAS strategy, MTS-3DCNNs, and MR3D-18 networks. The comprehensive comparisons with advanced methods demonstrate that our method is SOTA on two benchmarks.

show abstract

Section: Introductionmentioning

confidence: 99%

Gradient adaptive sampling and multiple temporal scale 3D CNNs for tactile object recognition

et al. 2023

View full text Add to dashboard Cite

show abstract

“…Novel Convolutional Neural Networks (CNNs) are also acquiring excellent results in multiple applications such as visual object recognition [25]. These methods can be used for recognizing objects contacted through tactile sensors [26], [27], [28].…”

Section: Introductionmentioning

confidence: 99%

CNN-Based Methods for Object Recognition With High-Resolution Tactile Sensors

2019

View full text Add to dashboard Cite

Novel high-resolution pressure-sensor arrays allow treating pressure readings as standard images. Computer vision algorithms and methods such as Convolutional Neural Networks (CNN) can be used to identify contact objects. In this paper, a high-resolution tactile sensor has been attached to a robotic endeffector to identify contacted objects. Two CNN-based approaches have been employed to classify pressure images. These methods include a transfer learning approach using a pre-trained CNN on an RGB-images dataset and a custom-made CNN (TactNet) trained from scratch with tactile information. The transfer learning approach can be carried out by retraining the classification layers of the network or replacing these layers with an SVM. Overall, 11 configurations based on these methods have been tested: 8 transfer learning-based, and 3 TactNet-based. Moreover, a study of the performance of the methods and a comparative discussion with the current state-of-the-art on tactile object recognition is presented.

show abstract

“…The main requirement behind this principle, which yields better results, is that there should be significant differences or diversity among the models. Many examples of the use of this principle in cognitive computation exist in the literature [44][45][46][47][48][49][50]. In accordance with the intrinsic hierarchy present in the data set, we will study two different scenarios: extrapolation with respect to different exercises and violinists.…”

Section: Introductionmentioning

confidence: 99%

Understanding Violin Players’ Skill Level Based on Motion Capture: a Data-Driven Perspective

et al. 2020

View full text Add to dashboard Cite

Learning to play and perform a music instrument is a complex cognitive task, requiring high conscious control and coordination of an impressive number of cognitive and sensorimotor skills. For professional violinists, there exists a physical connection with the instrument allowing the player to continuously manage the sound through sophisticated bowing techniques and fine hand movements. Hence, it is not surprising that great importance in violin training is given to right hand techniques, responsible for most of the sound produced. In this paper, our aim is to understand which motion features can be used to efficiently and effectively distinguish a professional performance from that of a student without exploiting sound-based features. We collected and made freely available a dataset consisting of motion capture recordings of different violinists with different skills performing different exercises covering different pedagogical and technical aspects. We then engineered peculiar features and trained a data-driven classifier to distinguish among two different levels of violinist experience, namely beginners and experts. In accordance with the hierarchy present in the dataset, we study two different scenarios: extrapolation with respect to different exercises and violinists. Furthermore, we study which features are the most predictive ones of the quality of a violinist to corroborate the significance of the results. The results, both in terms of accuracy and insight on the cognitive problem, support the proposal and support the use of the proposed technique as a support tool for students to monitor and enhance their home study and practice.

show abstract

End-to-End ConvNet for Tactile Recognition Using Residual Orthogonal Tiling and Pyramid Convolution Ensemble

Cited by 11 publications

References 56 publications

Gradient adaptive sampling and multiple temporal scale 3D CNNs for tactile object recognition

Gradient adaptive sampling and multiple temporal scale 3D CNNs for tactile object recognition

CNN-Based Methods for Object Recognition With High-Resolution Tactile Sensors

Understanding Violin Players’ Skill Level Based on Motion Capture: a Data-Driven Perspective

Contact Info

Product

Resources

About