2023
DOI: 10.1109/jstars.2022.3219816
|View full text |Cite
|
Sign up to set email alerts
|

CNN, RNN, or ViT? An Evaluation of Different Deep Learning Architectures for Spatio-Temporal Representation of Sentinel Time Series

Abstract: Rich information in multi-temporal satellite images can facilitate pixel-level land cover classification. However, what is the most suitable deep learning architecture for high-dimension spatio-temporal representation of remote sensing time-series remains unclear. In this study, we theoretically analyzed the different mechanisms of the different deep learning structures, including the commonly used convolutional neural network (CNN), the high-dimension CNN (3D CNN), the recurrent neural network (RNN), and the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(6 citation statements)
references
References 56 publications
0
6
0
Order By: Relevance
“…The most important feature of RNN is the hidden state throughout the input and output of the network. It is used as an input to the RNN along with the input vectors, which are updated and then used as an output to the RNN along with the output vectors [26,27]. At this point, the updated hidden state that is output will be used as part of the next input, thus preserving the previous information.…”
Section: Recurrent Neural Networkmentioning
confidence: 99%
See 1 more Smart Citation
“…The most important feature of RNN is the hidden state throughout the input and output of the network. It is used as an input to the RNN along with the input vectors, which are updated and then used as an output to the RNN along with the output vectors [26,27]. At this point, the updated hidden state that is output will be used as part of the next input, thus preserving the previous information.…”
Section: Recurrent Neural Networkmentioning
confidence: 99%
“…Sensors 2023, 23, 8966 4 of 27 output to the RNN along with the output vectors [26,27]. At this point, the updated hidden state that is output will be used as part of the next input, thus preserving the previous information.…”
Section: Recurrent Neural Networkmentioning
confidence: 99%
“…Meanwhile, active-based views are the quintessential views chosen to complement (and improve by fusion) the optical in classification tasks with MV models, e.g. by using SAR view [1], [15], [33], [34], [36], [99], [113], [120], [124], [126], [143], [163], [173], [174], [188], [190], [193], [199]- [202] or LiDAR view [94], [139]. Furthermore, the DSM view has been widely used together with the optical view [17]- [19], [55], [58], [64], [65], [76], [80], [84], [85], [88], [89], [92], [104], [123], [184], where on some occasions is a LiDAR-derived DSM [32], [34], [35], [59], [62], [81], [86], [87], [93], …”
Section: A Which Views Are Most Used In Earth Observation?mentioning
confidence: 99%
“…These works show that the predictive performance improves with respect to training on any of the single-views, e.g. with optical and activebased views (SAR/LiDAR/DSM) [1], [13], [13], [15], [18], [32]- [35], [58], [59], [65], [81], [86], [92]- [94], [96], [99], [102], [113], [133], [136], [143], [163], [169], [190], [191], [193], [199], [202]. This indicates that the views complement each other in the MV learning for EO tasks, in addition to the fact that there is evidence when other diverse views are chosen to supplement or replace the optical view [16], [70], [71], [106], [117], [159], [160], [181], [183], [210], [211].…”
Section: B Does the Use Of Additional Views Improve Predictive Perfor...mentioning
confidence: 99%
“…Prominent examples are satellite image fusion for improved land use/land cover classification [35], object detection [36,37], and change detection [38,39] in remote sensing images, as well as the delineation of agricultural fields from satellite images [40]. CNNs are appealing to the remote sensing community due to their inherent nature to exploit the two-dimensional structure of images, efficiently extracting spectral and spatial features, while RNNs can handle sequential input in continuous dimensions with sequential long-range dependency, thus making them appropriate for the analysis of the spectral-temporal information in time series stacks [19,34,[41][42][43][44].…”
Section: Introductionmentioning
confidence: 99%