2020
DOI: 10.1049/iet-ipr.2020.0019
|View full text |Cite
|
Sign up to set email alerts
|

Mutual information guided 3D ResNet for self‐supervised video representation learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 36 publications
(82 reference statements)
0
5
0
Order By: Relevance
“…Xue et al introduced a unique self-supervised learning technique based on ResNet, to enable automatic learning of video representations. Their suggested self-supervised learning approach was found to be effective and to have some degree of generalizability when utilized as an efficient pre-training technique for the job of identifying the activities in the video [8]. Salama et al suggested a ResNet-based breast cancer medical image-assisted diagnosis model and confirmed the model's efficacy.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Xue et al introduced a unique self-supervised learning technique based on ResNet, to enable automatic learning of video representations. Their suggested self-supervised learning approach was found to be effective and to have some degree of generalizability when utilized as an efficient pre-training technique for the job of identifying the activities in the video [8]. Salama et al suggested a ResNet-based breast cancer medical image-assisted diagnosis model and confirmed the model's efficacy.…”
Section: Related Workmentioning
confidence: 99%
“…In equation (8), act denotes the activation function and θ denotes the weight data. b denotes the bias and '…”
Section: B Modified Resnet Algorithm Incorporating Lstmsmentioning
confidence: 99%
“…Our 3D ResNet 50 model employs 3D convolutions and 3D batch normalization. This spatiotemporal-feature-learning method enables the 3D ResNet 50 model to better capture the dynamic information in videos compared to 2D CNN, resulting in an enhanced ability to identify and differentiate the categories of videos 41,42 . Furthermore, we have adjusted the structure of identity shortcuts to reduce information loss during downsampling and reduced the frequency at which the model doubles the number of channels after passing through the residual structure (see Supplementary Information for the detailed structure of the ResNet 50 model and the principle of the basic residual block in ResNet, S5 and S6).…”
Section: Model Construction and Evaluationmentioning
confidence: 99%
“…Based on the first-generation lightweight network mobile network vision 1 (MobileNetV1), the concepts of inverted residuals and linear bottlenecks are introduced into MobileNetV2 [27]. As a DW convolution cannot change the number of channels, feature extraction is restricted by the number of input channels.…”
Section: Mobilenetv2 Networkmentioning
confidence: 99%