2021
DOI: 10.1109/access.2020.3043452
|View full text |Cite
|
Sign up to set email alerts
|

Structure-Aware Procedural Text Generation From an Image Sequence

Abstract: It is an important activity for our society to create new value by combining materials. From daily cooking to manufacturing for industry, we often describe the way to do it as a procedural text. As pointed by some previous studies for natural language understanding, one important property of the procedural text is its dependency of the context, which is the merging operations of materials and can be represented by a graph or tree structure. This paper aims to investigate the impact of explicitly introducing su… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(4 citation statements)
references
References 27 publications
0
4
0
Order By: Relevance
“…Video captioning algorithms were also used to extract recipes from an uncut video (Nishimura, Hashimoto, et al, 2022). This sort of system can also be made using off-the-shelf vision models and implemented into a robotic chef system (Sochacki, Abdulali, Hosseini, et al, 2023).…”
Section: Recipes Understanding and Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…Video captioning algorithms were also used to extract recipes from an uncut video (Nishimura, Hashimoto, et al, 2022). This sort of system can also be made using off-the-shelf vision models and implemented into a robotic chef system (Sochacki, Abdulali, Hosseini, et al, 2023).…”
Section: Recipes Understanding and Learningmentioning
confidence: 99%
“…Robotic Chef was shown to translate a recipe into actions using a hard‐coded text analysis logic and follow them (Bollini et al, 2013). Video captioning algorithms were also used to extract recipes from an uncut video (Nishimura, Hashimoto, et al, 2022). This sort of system can also be made using off‐the‐shelf vision models and implemented into a robotic chef system (Sochacki, Abdulali, Hosseini, et al, 2023).…”
Section: Emerging Technologiesmentioning
confidence: 99%
“…This task is more complex than regular image captioning [15,35] due to the difficulty in decoding long recipe texts. Nishimura et al [22,23] approached this problem by generating instructions from a sequence of images. Wang et al [36] first estimate the intermediate tree-structured representation of cooking instructions from an image, and generate full sentences from it.…”
Section: Cross-modal Synthesismentioning
confidence: 99%
“…By verbalizing the task contents, a task procedure can be materialized and the reproducibility of the same work by humans and robots can be improved. Previously, Nishimura et al [11] generated instructions by verbalizing a series of task information from a series of cooking images. Erdal et al [2] proposed a framework that could automatically describe task motions from demonstration videos of human tasks.…”
Section: Related Workmentioning
confidence: 99%