Proceedings of the 26th ACM International Conference on Multimedia 2018
DOI: 10.1145/3240508.3240627
|View full text |Cite
|
Sign up to set email alerts
|

Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
142
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 100 publications
(144 citation statements)
references
References 36 publications
2
142
0
Order By: Relevance
“…The resource types include image , recipe , food title , title and ingredients , and recipe-image pair ( , ). The complete training data refers to the set of recipe-image pairs for fully supervised model training [10,33,36]. In the remaining sections, we abbreviate the source and target domains with the superscripts and respectively.…”
Section: Cross-domain Food Transfermentioning
confidence: 99%
See 1 more Smart Citation
“…The resource types include image , recipe , food title , title and ingredients , and recipe-image pair ( , ). The complete training data refers to the set of recipe-image pairs for fully supervised model training [10,33,36]. In the remaining sections, we abbreviate the source and target domains with the superscripts and respectively.…”
Section: Cross-domain Food Transfermentioning
confidence: 99%
“…Based upon these prior works [4,7,9,29,33,36], this paper extends from cross-modal to cross-domain food retrieval. Leveraging on image-recipe pairs in a source domain, we consider the problem of food transfer as recognizing food in a target domain with new food categories and attributes.…”
Section: Introductionmentioning
confidence: 99%
“…Recipe1M [27] is the only large-scale food dataset with English recipes and images publicly available. Many related works [6,26,27,32] are based on this dataset. The raw dataset contains more than 1 million recipes and almost 900k images.…”
Section: Experiments 41 Datasetsmentioning
confidence: 99%
“…People tend to spend much time on recipes because cooking is closely related to our life. Lots of words have been done to deconstruct and understand food, including food classification [8,16], recipe-image embedding [6,27,32] and image-to-recipe generation [26]. Furthermore, dish appearance visualization in advance will be of great help for designing new recipes, which provides evident significance to image generation from given recipes.…”
Section: Introductionmentioning
confidence: 99%
“…Deriving a joint representation from different modalities associated with a multimedia item has been a long-standing research question in cross-media retrieval [6,19,30,34,37]. The main idea behind such approaches is learning a common space to which different modalities, usually text and visual, can be mapped and directly compared.…”
Section: Social Multimedia Representationmentioning
confidence: 99%