Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413765
|View full text |Cite
|
Sign up to set email alerts
|

Multi-modal Cooking Workflow Construction for Food Recipes

Abstract: Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe. This is a non-trivial task that involves common-sense reasoning. However, existing efforts rely on hand-crafted features to extract the workflow graph from recipes due to the lack of large-scale labeled datasets. Moreover, they fail to utilize the cooking images, which constitute an important part of food recipes. In t… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 15 publications
(1 citation statement)
references
References 33 publications
0
1
0
Order By: Relevance
“…Although the above work has focused on parsing text-only recipes, little work has not addressed cross-modal analysis due to limited available datasets. Pan et al [11] recently created a novel cross-modal dataset, MM-ReS dataset. This dataset consists of recipes, image sequences, and annotated tree structures, allowing us to analyze the cause-and-effect relations between step texts and images in the recipe and image sequence.…”
Section: B Structure Estimation For Context Dependencymentioning
confidence: 99%
“…Although the above work has focused on parsing text-only recipes, little work has not addressed cross-modal analysis due to limited available datasets. Pan et al [11] recently created a novel cross-modal dataset, MM-ReS dataset. This dataset consists of recipes, image sequences, and annotated tree structures, allowing us to analyze the cause-and-effect relations between step texts and images in the recipe and image sequence.…”
Section: B Structure Estimation For Context Dependencymentioning
confidence: 99%