2023
DOI: 10.48550/arxiv.2302.06706
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark)

Abstract: Intrigued by the claims of emergent reasoning capabilities in LLMs trained on general web corpora, in this paper, we set out to investigate their planning capabilities. We aim to evaluate (1) how good LLMs are by themselves in generating and validating simple plans in commonsense planning tasks (of the type that humans are generally quite good at) and ( 2) how good LLMs are in being a source of heuristic guidance for other agents-either AI planners or human planners-in their planning tasks. To investigate thes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(12 citation statements)
references
References 27 publications
0
12
0
Order By: Relevance
“…Existing works have demonstrated the planning abilities of both the decoder type (Pallagani et al 2022) and the encoder-decoder type architectures (Valmeekam et al 2023(Valmeekam et al , 2022. Since the generated plan is in free-form language and may contain unrecognizable (for the environment) words or incorrect syntax, it cannot be directly translated into actionable steps in the environment.…”
Section: Experimental Setup Say Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…Existing works have demonstrated the planning abilities of both the decoder type (Pallagani et al 2022) and the encoder-decoder type architectures (Valmeekam et al 2023(Valmeekam et al , 2022. Since the generated plan is in free-form language and may contain unrecognizable (for the environment) words or incorrect syntax, it cannot be directly translated into actionable steps in the environment.…”
Section: Experimental Setup Say Modelmentioning
confidence: 99%
“…using a controller to regulate the heater where only a knob exists), leading to infeasible plans. Moreover, such models focus greedily on the next actionable step without considering its relevance to the ultimate goal, resulting in longer, cost-inefficient plans (Valmeekam et al 2023). Recent works like SayCan (Ahn et al 2022) have sought to address the affordance problem by using pretrained skills to evaluate the action's executability -Can the action be executed in the current state?…”
Section: Introductionmentioning
confidence: 99%
“…STARS attempts to verify LLM responses before attempting to achieve the goal indicated by a response. There are many approaches to verification of LLM knowledge, including 1) response sampling (Wang et al 2023), 2) use of other sources of knowledge such as planning (Valmeekam et al 2023) or an LLM (Kim, Baldi, and McAleer 2023), and 3) human feedback/annotation (TidyBot). Recursively Criticizes and Improves (RCI; Kim, Baldi, and McAleer 2023) verifies LLM output by prompting the LLM again to identify (potential) issues.…”
Section: Related Workmentioning
confidence: 99%
“…Although recent AI systems show some abstraction and reasoning abilities, however, they seem to use pattern matching, shortcuts, and memorization of some aspect of the reasoning process [50][51][52][53][54]. Despite the remarkable ability of Large Language Models (LLMs) [9,[70][71][72][73][74] in learning some patterns of the reasoning process and then apply them in different context, they still lack understanding of the coherent text they produce when they are probed more deeply [68,[75][76][77][78][86][87][88]. Another flaw in the argument that deep learning will lead to human-level intelligence is the assumption that intelligence will somehow emerge through training neural networks without providing any convincing justification for this assumption.…”
Section: Introductionmentioning
confidence: 99%