Fusion Learning using Semantics and Graph Convolutional Network for Visual Food Recognition

Zhao, Heng; Yap, Kim-Hui; Kot, Alex C.

doi:10.1109/wacv48630.2021.00175

Cited by 19 publications

(16 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(4) Large-Scale Few-Shot Food Recognition (LS-FSFR) Recently, there are some works on few-shot food recognition on small/medium-scale food categories [14], [43]. In contrast, LS-FSFR is a more realistic task that aims to identify hundreds of novel food categories without forgetting those categories, where each novel category has only a few samples [103].…”

Section: Discussionmentioning

confidence: 99%

“…For example, Qiu et al [9] propose a PAR-Net to mine discriminative food regions to improve the performance of classification. There are also some recent works on few-shot food recognition [14], [43]. For example, Zhao et al [14] propose a fusion learning framework, which utilizes a graph convolutional network to capture inter-class relations between image representations and semantic embeddings of different categories for both few-shot and many-shot food recognition.…”

Section: Related Workmentioning

confidence: 99%

“…There are also some recent works on few-shot food recognition [14], [43]. For example, Zhao et al [14] propose a fusion learning framework, which utilizes a graph convolutional network to capture inter-class relations between image representations and semantic embeddings of different categories for both few-shot and many-shot food recognition. In addition, there are many works [26], [28], [35], [44], [45], which introduce additional context information, e.g., GPS and ingredient information to improve the recognition performance.…”

Section: Related Workmentioning

confidence: 99%

“…Food computing [1] has raised great interest recently for its various applications in health, culture, etc. It contains different tasks, such as food recognition [9], [11], [14], detection [40], segmentation [46], retrieval [47], [48], [49] and generation [18], [50]. Among these tasks, food recognition is an important and basic task for further supporting more complex food-relevant vision and multimodal tasks.…”

Section: Related Workmentioning

confidence: 99%

“…In addition, food image recognition is an important branch of fine-grained visual classification, and thus has important theoretical research significance. For these reasons, food recognition has been drawing more attention in computer vision and beyond [8], [9], [10], [11], [12], [13], [14].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Large Scale Visual Food Recognition

Min¹,

Wang²,

Liu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Food recognition plays an important role in food choice and intake, which is essential to the health and well-being of humans. It is thus of importance to the computer vision community, and can further support many food-oriented vision and multimodal tasks, e.g., food detection and segmentation, cross-modal recipe retrieval and generation. Unfortunately, we have witnessed remarkable advancements in generic visual recognition for released large-scale datasets, yet largely lags in the food domain. In this paper, we introduce Food2K, which is the largest food recognition dataset with 2,000 categories and over 1 million images. Compared with existing food recognition datasets, Food2K bypasses them in both categories and images by one order of magnitude, and thus establishes a new challenging benchmark to develop advanced models for food visual representation learning. Furthermore, we propose a deep progressive region enhancement network for food recognition, which mainly consists of two components, namely progressive local feature learning and region feature enhancement. The former adopts improved progressive training to learn diverse and complementary local features, while the latter utilizes self-attention to incorporate richer context with multiple scales into local features for further local feature enhancement. Extensive experiments on Food2K demonstrate the effectiveness of our proposed method. More importantly, we have verified better generalization ability of Food2K in various tasks, including food image recognition, food image retrieval, cross-modal recipe retrieval, food detection and segmentation. Food2K can be further explored to benefit more food-relevant tasks including emerging and more complex ones (e.g., nutritional understanding of food), and the trained models on Food2K can be expected as backbones to improve the performance of more food-relevant tasks. We also hope Food2K can serve as a large scale fine-grained visual recognition benchmark, and contributes to the development of large scale fine-grained visual analysis.

show abstract

Section: Discussionmentioning

confidence: 99%