2021
DOI: 10.1088/1742-6596/2002/1/012068
|View full text |Cite
|
Sign up to set email alerts
|

Image Classification for Soybean and Weeds Based on ViT

Abstract: Abstracts. In this paper, ViT deep neural network based on self-attention mechanism is used in classification for images of soybean and weeds. Firstly, the overall image is split into multiple tiles; with each tile regarded as a word, the whole image is regarded as a sentence, which can be used for image semantic recognition by natural language processing technology. We designed a ViT network with sequence length of 50, embedded dimension of 384, and self-attention module layers of 12. With soybean weed classi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 2 publications
0
4
0
Order By: Relevance
“…The performance of 1Dtransformer models have generally been less studied compared to its 2D counterpart [47,48] and other CNN-based models. Our study proved its slight superiority in classifying weeds (mainly greenish), small-grain crops (mainly yellowish), and 'other', as well as estimating total weed coverage in sub-field plots.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The performance of 1Dtransformer models have generally been less studied compared to its 2D counterpart [47,48] and other CNN-based models. Our study proved its slight superiority in classifying weeds (mainly greenish), small-grain crops (mainly yellowish), and 'other', as well as estimating total weed coverage in sub-field plots.…”
Section: Discussionmentioning
confidence: 99%
“…Multi-Head Attention was adopted to further increase the flexibility of the information to be focused and potentially to boost the performance of a transformer model [46]. Transformer models have recently also proved to perform better than CNNbased DL models in different weeds and crop classifications [47,48], and even in semantical segmentation of different weeds species [48]. However, these studies focused on highresolution imagery taken by proximal sensors or UAVs flying at low altitudes, and DLbased semantic segmentation frameworks were only implemented in a 2D space, e.g., both 2D transformer and 2D CNN.…”
Section: Introductionmentioning
confidence: 99%
“…The introduction of Vision Transformer (ViT) [35] signified a major shift in deep learning for image analysis, proposed by Google in 2020. ViT represents the first successful application of the Transformer structure, initially designed for natural language processing (NLP) tasks [36], for image classification tasks, subsequently demonstrating significant potential in plant disease detection [37]. This innovation not only revealed the potential of the Transformer structure in processing non-sequential data [38] but also initiated a new chapter in the application of self-attention mechanisms in the field of computer vision [39].…”
Section: Vision Transformermentioning
confidence: 99%
“…Experiments were conducted to compare the effect of varying training and test set sizes. Similarly, Liang et al [ 24 ] used ViT for the classification of soybean and weeds.…”
Section: Related Studiesmentioning
confidence: 99%