2023
DOI: 10.31219/osf.io/8r3ma
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Exploring GPT-3 Model's Capability in Passing the Sally-Anne Test A Preliminary Study in Two Languages

Abstract: This study aimed to investigate the capability of GPT-3 in passing the Sally-Anne test in two languages: Chinese and English. Three experiments were conducted to evaluate the model's performance under different questioning prompts. The results showed that with appropriate prompts, the model was able to consistently pass the test. The findings highlight the sensitivity of GPT-3 to different prompts and demonstrate the potential for using large language models as research subjects. This study sheds light on the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 13 publications
0
4
0
Order By: Relevance
“…The dis-embodied cognition of GPT models could explain failures in recognizing faux pas, but they may also underlie their success on other tests. One example is the false belief test, one of the most widely used tools so far for testing the performance of LLMs on social cognitive tasks 19,[21][22][23]25,51,52 . In this test, participants are presented with a story where a character's belief about the world (the location of the item) differs from the participant's own belief.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The dis-embodied cognition of GPT models could explain failures in recognizing faux pas, but they may also underlie their success on other tests. One example is the false belief test, one of the most widely used tools so far for testing the performance of LLMs on social cognitive tasks 19,[21][22][23]25,51,52 . In this test, participants are presented with a story where a character's belief about the world (the location of the item) differs from the participant's own belief.…”
Section: Discussionmentioning
confidence: 99%
“…The recent rise of large language models (LLMs), such as generative pre-trained transformer (GPT) models, has shown some promise that artificial theory of mind may not be too distant an idea. Generative LLMs exhibit performance that is characteristic of sophisticated decision-making and reasoning abilities 19,20 including solving tasks widely used to test theory of mind in humans [21][22][23][24] . However, the mixed success of these models 23 , along with their vulnerability to small perturbations to the provided prompts, including simple changes in characters' perceptual access 25 , raises concerns about the robustness and interpretability of the observed successes.…”
Section: Performance Across Theory Of Mind Testsmentioning
confidence: 99%
“…It is used in developmental psychology to measure a person's social cognitive ability to attribute false beliefs to others. See here and [21]. Various versions of the problem can be defined based on whether the boxes are transparent or not.…”
Section: Reasoningmentioning
confidence: 99%
“…The recent rise of Large Language Models (LLMs), such as Generative Pre-trained Transformer (GPT) models, has shown some promise that AI Theory of Mind may not be too distant an idea. Generative LLMs exhibit a range of emergent capacities for sophisticated decision-making and reasoning abilities 2,3 including solving tasks widely used to test Theory of Mind in humans [4][5][6] .…”
Section: Introductionmentioning
confidence: 99%