2016
DOI: 10.1587/transinf.2015edl8212
|View full text |Cite
|
Sign up to set email alerts
|

Food Image Recognition Using Covariance of Convolutional Layer Feature Maps

Abstract: Atsushi TATSUMA†a) and Masaki AONO †b) , Members SUMMARY Recent studies have obtained superior performance in image recognition tasks by using, as an image representation, the fully connected layer activations of Convolutional Neural Networks (CNN) trained with various kinds of images. However, the CNN representation is not very suitable for fine-grained image recognition tasks involving food image recognition. For improving performance of the CNN representation in food image recognition, we propose a novel im… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 24 publications
(10 citation statements)
references
References 19 publications
0
10
0
Order By: Relevance
“…Otherwise, the prediction is wrong. Tatsuma and Aono () reported a new approach for food classification by using the covariances of features of trained CNN as the representation of images, and achieved 58.65% for average accuracy. Yanai and Kawano () used the fine‐tuned AlexNet to achieve the top‐1 accuracy for 70.41%.…”
Section: Deep Learning Applications In Foodmentioning
confidence: 99%
“…Otherwise, the prediction is wrong. Tatsuma and Aono () reported a new approach for food classification by using the covariances of features of trained CNN as the representation of images, and achieved 58.65% for average accuracy. Yanai and Kawano () used the fine‐tuned AlexNet to achieve the top‐1 accuracy for 70.41%.…”
Section: Deep Learning Applications In Foodmentioning
confidence: 99%
“…The results of all the state-of-the-art methods are from the leaderboard of SHREC 2016 Large-scale 3D Shape Retrieval from ShapeNet Core55 [1]. In DB-FMCD-FUL-LCDR [44], Feature Maps Covariance Descriptor (FMCD) is calculated on each depth-buffer image rendered for a given 3D shape and ranking scores is calculated by using the Locally Con-strained Diffusion Ranking (LCDR). Then in CCMLT, each 3D shape is rendered into 36 channel of data via concatenation of 36 2D projected images in sequence, where Multi-channel data is utilized to train a feature fusion matrix inside a CNN [45].…”
Section: Comparison To State-of-the-art Methodsmentioning
confidence: 99%
“…After the above processing operations, convolutional layer is used to capture the local features of traffic data. Convolutional layer [33], [34] is the most important part of the CNN, which convolves the input images (or feature maps) with multiple convolutional kernels to create different feature maps. According to [35], the shallower convolutional layers whose receptive field is narrow can extract local information, and while the deeper layers can capture global information with larger vision field.…”
Section: B Multiple Convolutional Layersmentioning
confidence: 99%