On the Paradox of Learning to Reason from Data

Zhang, Honghua; Li, Liunian Harold; Tao, Meng; Chang, Kai-Wei; Broeck, Guy Van den

doi:10.48550/arxiv.2205.11502

Cited by 11 publications

(17 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We also observe ILLUME to yield no satisfying results on the CLEVR-X dataset. Details in this regard can be found in Appendix G. Summarized, we attribute this behavior to the same observations made by Zhang et al (2022) in that current LMs appear incapable of inferring logical reasoning from a few training examples. Therefore, VLMs bootstrapped from LMs struggle to transfer logical reasoning capabilities without major extensions.…”

Section: (Bottom)supporting

confidence: 68%

“…Flaws in Logical Reasoning One frequently observed shortcoming of large neural networks is their inability to generalize to logical reasoning. Zhang et al (2022) recently demonstrated that BERT does not learn logical reasoning but instead captures statistical features in the training data. Therefore, the model remains unable to generalize to other distributions of the exact same problem.…”

Section: (Bottom)mentioning

confidence: 99%

See 1 more Smart Citation

ILLUME: Rationalizing Vision-Language Models by Interacting with their Jabber

Brack¹,

Schramowski²,

Deiseroth³

et al. 2022

Preprint

View full text Add to dashboard Cite

Bootstrapping from pre-trained language models has been proven to be an efficient approach for building foundation vision-language models (VLM) for tasks such as image captioning or visual question answering. However, it is difficult-if not impossible-to utilize it to make the model conform with user's rationales for specific answers. To elicit and reinforce commonsense reasons, we propose an iterative sampling and tuning paradigm, called ILLUME, that executes the following loop: Given an image-question-answer prompt, the VLM samples multiple candidate rationales, and a human critic provides minimal feedback via preference selection, used for fine-tuning. This loop increases the training data and gradually carves out the VLM's rationalization capabilities. Our exhaustive experiments demonstrate that ILLUME is competitive with standard supervised fine-tuning while using significantly fewer training data and only requiring minimal feedback.

show abstract

Section: (Bottom)supporting

confidence: 68%

Section: (Bottom)mentioning

confidence: 99%

ILLUME: Rationalizing Vision-Language Models by Interacting with their Jabber

Brack¹,

Schramowski²,

Deiseroth³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Even though the results (in Table 2) showcase an uptick in the number of successful plans, the overall performance is still around 20%. This is unsurprising as [36] point out that language models tend to focus on the inherent statistical features in reasoning problems which affects their performance on such tasks.…”

Section: A Appendix A1 Additional Experimentsmentioning

confidence: 99%

On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark)

Valmeekam¹,

Sreedharan²,

Marquez³

et al. 2023

Preprint

View full text Add to dashboard Cite

Intrigued by the claims of emergent reasoning capabilities in LLMs trained on general web corpora, in this paper, we set out to investigate their planning capabilities. We aim to evaluate (1) how good LLMs are by themselves in generating and validating simple plans in commonsense planning tasks (of the type that humans are generally quite good at) and ( 2) how good LLMs are in being a source of heuristic guidance for other agents-either AI planners or human planners-in their planning tasks. To investigate these questions in a systematic rather than anecdotal manner, we start by developing a benchmark suite based on the kinds of domains employed in the International Planning Competition. On this benchmark, we evaluate LLMs in three modes: autonomous, heuristic and human-in-the-loop. Our results show that LLM's ability to autonomously generate executable plans is quite meager, averaging only about 3% success rate. The heuristic and human-in-the-loop modes show slightly more promise. In addition to these results, we also make our benchmark and evaluation tools available to support investigations by research community.

show abstract

“…After all, modern machine learning algorithms are nothing else than statistical models based on a huge set of matrix multiplications, and there are publications highlighting the inherent limits of such systems. For example, it lies in the nature of their architecture that AI models will never be able to do things that are quite simple for humans, such as making logical inferences or entertain abstract reasoning-because all they do (and this is what they are really good at) is making extrapolations using statistical operations [29,137,138]. However, such claims may be debatable since they strongly depend on what we mean when making reference to terms like "abstract reasoning" or also just "reasoning" per se [26,49].…”

Section: Implications For the (Digital) Humanitiesmentioning

confidence: 99%

“…The most groundbreaking model in April was called Flamingo produced by Google's DeepMind because it combined a 70 Bp language model with a 10 Mb image model [3]. In May, Meta AI released OPT-175B with an open-sourced code ( [137,138]. And just in the past few days at the time of this writing, Google AI published LaMDA 2 [117], DeepMind released an intriguing model called Gato with 1.18 Bp [100], and Google Research published UL20B [113] At the time of this publication, the last few weeks showed groundbreaking innovations in multimodality by combining natural language processing (NLP) with computer vision, the most prominent models are Dall-E 2 by OpenAI [105], Flamingo by DeepMind [3], and a few days ago, Imagen was introduced by Google [105].…”

mentioning

confidence: 99%

The rapid competitive economy of machine learning development: a discussion on the social risks and benefits

Walter

2023

AI Ethics

View full text Add to dashboard Cite

Research in artificial intelligence (AI) has started in the twentieth century but it was not until 2012 that modern models of artificial neural networks aided the machine learning process considerably so that in the past ten years, both computer vision as well as natural language processing have become increasingly better. AI developments have accelerated rapidly, leaving open questions about the potential benefits and risks of these dynamics and how the latter might be managed. This paper discusses three major risks, all lying in the domain of AI safety engineering: the problem of AI alignment, the problem of AI abuse, and the problem of information control. The discussion goes through a short history of AI development, briefly touching on the benefits and risks, and eventually making the case that the risks might potentially be mitigated through strong collaborations and awareness concerning trustworthy AI. Implications for the (digital) humanities are discussed.

show abstract

On the Paradox of Learning to Reason from Data

Cited by 11 publications

References 14 publications

ILLUME: Rationalizing Vision-Language Models by Interacting with their Jabber

ILLUME: Rationalizing Vision-Language Models by Interacting with their Jabber

On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark)

The rapid competitive economy of machine learning development: a discussion on the social risks and benefits

Contact Info

Product

Resources

About