2022
DOI: 10.1038/s41467-022-30761-2
|View full text |Cite
|
Sign up to set email alerts
|

Towards artificial general intelligence via a multimodal foundation model

Abstract: The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities of human. Despite tremendous success in the AI research, most of existing methods have only single-cognitive ability. To overcome this limitation and take a solid step towards artificial general intelligence (AGI), we develop a foundation model pre-trained with huge multimodal data, which can be quickly adapted for various downstream cognitive tasks. To achieve this goal, we propose to pre-train our foundation model … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
40
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 106 publications
(40 citation statements)
references
References 34 publications
0
40
0
Order By: Relevance
“…45 Existing artificial neural network models also do not yet reach human performance on tasks involving multimodal integration. 39,40 For these reasons, future work to mimic the coding properties of anterior temporal lobe structures in artificial neural networks may enable machines to better approximate the remarkable human ability to learn concepts.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…45 Existing artificial neural network models also do not yet reach human performance on tasks involving multimodal integration. 39,40 For these reasons, future work to mimic the coding properties of anterior temporal lobe structures in artificial neural networks may enable machines to better approximate the remarkable human ability to learn concepts.…”
Section: Resultsmentioning
confidence: 99%
“…An important line of future investigation may be to explore whether the same experience-dependent changes occur in artificial neural networks that aim to learn multimodal object concepts. 39,40 Previous human neuroimaging has shown that the anterior temporal lobes are important for intra-object configural representations, 41 such that damage to the perirhinal cortex 14,42 leads to object discrimination impairment. For example, human participants with perirhinal cortex damage are unable to resolve feature-level interference created by viewing multiple objects with overlapping features.…”
Section: Discussionmentioning
confidence: 99%
“…6). Recently, (Fei et al, 2022) have employed a similar technique to visualize what they call the "imagination" of a model trained to bring closer an image and a matching textual description in a higher-level feature space. In a similar way, my imaging consciousness could try to strongly re-activate the processes that allow me to recognize (in perception) my friend as being "blond, tall, with a snub or aquiline nose, etc.".…”
Section: Sartrean Imagination: Conscious Re-presentation Of Sedimente...mentioning
confidence: 99%
“…In [33], to enhance the performance of transferring models in a zero-shot fashion, LiT, namely Lockedimage Tuning, was proposed using contrastive-tuning methods. BriVL (Bridging-Vision-and-Language) was proposed in [34] as a method for obtaining multi-cognitive abilities and developing general AI by leveraging weak semantic correlation data. More details about big AI models can be found in [23].…”
Section: B Big Ai Modelsmentioning
confidence: 99%