2018
DOI: 10.48550/arxiv.1810.06553
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(22 citation statements)
references
References 0 publications
0
22
0
Order By: Relevance
“…The proposed method is trained and evaluated on Recipe1M [22], the largest publicly available multi-modal food database. Recipe1M [17] and with Ingredient Attention based instruction encoding provides over 1 million recipes (ingredients and instructions), accompanied by one or more images per recipe, leading to 13 million images. The large corpus is supplemented with semantic information (1048 meal classes) for injecting an additional source of information in potential models.…”
Section: Materials and Methods 21 Databasementioning
confidence: 99%
See 1 more Smart Citation
“…The proposed method is trained and evaluated on Recipe1M [22], the largest publicly available multi-modal food database. Recipe1M [17] and with Ingredient Attention based instruction encoding provides over 1 million recipes (ingredients and instructions), accompanied by one or more images per recipe, leading to 13 million images. The large corpus is supplemented with semantic information (1048 meal classes) for injecting an additional source of information in potential models.…”
Section: Materials and Methods 21 Databasementioning
confidence: 99%
“…The proposed model architecture is based on a multi-path approach for each of the involved input data types namely, instructions, ingredients and images, similarly to [17]. In Figure 2, the overall structure is presented.…”
Section: Model Architecturementioning
confidence: 99%
“…Recent introductions of large-scale food-related datasets have further accelerated the research improvements on food understanding. Considering the application purpose, the datasets can be categorized into two groups: food recognition [3,38] and cross-modal recipe retrieval [48,37,41,42,7,48]. We focus on recipe retrieval task in this paper, aiming at retrieval relevant cooking recipes with respect to the image query and vise versa.…”
Section: Related Workmentioning
confidence: 99%
“…A typical recipe consists of a list of ingredients and cooking instructions which may not directly align with the appearance of the corresponding food image. Typically, recent efforts have formulated im2recipe as a cross-modal retrieval problem [48,37,6,62], to align matching recipe-image pairs in a shared latent space with retrieval learning approaches. Concretely, prior work builds two independent networks to encode textual recipes (ingredients and cooking instructions) and food images into embeddings respectively.…”
Section: Introductionmentioning
confidence: 99%
“…Images from many sources are often plated and shot with an artistic intention to increase appeal rather than represent realism. The few large scale and diverse datasets, such as Recipe1M [12], are mined from recipe websites. While these contain valuable dish level, ingredient level, and preparation attribute annotations, they almost always lack annotations for the portion sizes shown in the photos.…”
Section: Introductionmentioning
confidence: 99%