Large language models (LLMs) and LLM-driven chatbots such as ChatGPT have shown remarkable capacities in comprehending and producing language. However, their internal workings remain a black box in cognitive terms, and it is unclear whether LLMs and chatbots can develop humanlike characteristics in language use. Cognitive scientists have devised many experiments that probe, and have made great progress in explaining, how people process language. We subjected ChatGPT to 12 of these experiments, preregistered and with 1,000 runs per experiment. In 10 of them, ChatGPT replicated the human pattern of language use. It associated unfamiliar words with different meanings depending on their forms, continued to access recently encountered meanings of ambiguous words, reused recent sentence structures, reinterpreted implausible sentences that were likely to have been corrupted by noise, glossed over errors, drew reasonable inferences, associated causality with different discourse entities according to verb semantics, and accessed different meanings and retrieved different words depending on the identity of its interlocutor. However, unlike humans, it did not prefer using shorter words to convey less informative content and it did not use context to disambiguate syntactic ambiguities. We discuss how these convergences and divergences may occur in the transformer architecture. Overall, these experiments demonstrate that LLM-driven chatbots like ChatGPT are capable of mimicking human language processing to a great extent, and that they have the potential to provide insights into how people learn and use language.
Recent large language models (LLMs) and LLM-driven chatbots, such as ChatGPT, have sparked debate regarding whether these artificial systems can develop human-like linguistic capacities. We examined this issue by investigating whether ChatGPT resembles humans in its ability to enrich literal meanings of utterances with pragmatic implicatures. Humans not only distinguish implicatures from truth-conditional meanings of utterances but also compute implicatures contingent on the communicative context. In three preregistered experiments (https://osf.io/4bcx9/), we assessed whether ChatGPT resembles humans in the computation of pragmatic implicatures. Experiment 1 investigated generalized conversational implicatures (GCIs); for example, the utterance “She walked into the bathroom. The window was open.” has the conversational implicature that the window is located in the bathroom, while the truth-conditional (literal) meaning of the utterance allows for the possibility that the window is located elsewhere. Humans demonstrate their ability to distinguish GCIs from the truth-conditional meanings by inhibiting the computation of GCIs when explicitly instructed to focus on the literal sense of the utterances. We tested whether ChatGPT could also inhibit the computation of GCIs as humans do. Experiment 2 and Experiment 3 investigated whether the communicative context modulates how ChatGPT computes a specific type of GCIs, namely scalar implicatures (SIs). For humans, the sentence “Julie had found a crab or a starfish” implies that Julie did not have both a crab and a starfish, even though the sentence’s literal meaning allows for this possibility. Moreover, this implicature is argued to be more available when the word “or” is in the information focus, e.g. as a reply to the question “What had Julie found?” than in the information background, e.g. as a reply to the question “Who had found a crab or a starfish?”. Experiment 2 tested whether ChatGPT shows similar sensitivity to information structure when computing SIs. Experiment 3 focused on a different contextual aspect, investigating whether face-threatening and face-boosting contexts have different effects on how ChatGPT computes SIs. Previous research has shown that humans interlocutors compute more SIs in face-boosting contexts, e.g. interpreting the utterance “Some people loved your poem.” as saying “Not all people loved your poem.” but not so much when they are in face-threatening contexts; and we tested whether ChatGPT exhibits a similar tendency. In all three experiments, ChatGPT did not display human-like flexibility in switching between pragmatic and semantic processing and failed to show the well-established effects of communicative context on the SI rate. Overall, our experiments demonstrate that although ChatGPT parallels or even surpasses humans in many linguistic tasks, it still does not closely resemble human beings in the computation of GCIs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.