2023
DOI: 10.1098/rsos.230539
|View full text |Cite
|
Sign up to set email alerts
|

General intelligence requires rethinking exploration

Abstract: We are at the cusp of a transition from ‘learning from data’ to ‘learning what data to learn from’ as a central focus of artificial intelligence (AI) research. While the first-order learning problem is not completely solved, large models under unified architectures, such as transformers, have shifted the learning bottleneck from how to effectively train models to how to effectively acquire and use task-relevant data. This problem, which we frame as exploration , is a universal aspect of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 114 publications
0
7
0
Order By: Relevance
“…In the context of this scenario, the symbol Y j denotes the average value observed in the jth treatment or group, as defined in Formula (7). Similarly, Y ij represents the overall average value across all treatments, as expressed in Formula (8).…”
Section: Source Of Variationmentioning
confidence: 99%
See 1 more Smart Citation
“…In the context of this scenario, the symbol Y j denotes the average value observed in the jth treatment or group, as defined in Formula (7). Similarly, Y ij represents the overall average value across all treatments, as expressed in Formula (8).…”
Section: Source Of Variationmentioning
confidence: 99%
“…However, securing a high-quality benchmark remains a formidable challenge. The preliminary phase of the benchmarking theory involves delineating the problem domain, an intricate task due to the requisite "uniform" distribution of test functions across the entire expanse of potential functions within the problem domain [7].…”
Section: Introductionmentioning
confidence: 99%
“…PLR has been empirically shown to be more scalable and also is able to achieve more robust and better "out-of-distribution" performance than PAIRED. In [Jiang et al, 2021a], authors combined randomized generator and replay mechanism together and proposed PLR ⊥ . Empirically, PLR ⊥ achieves the state-of-the-art in literature (as a method that demands no human expertise), and thus we will adopt PLR ⊥ as our primary baseline in this paper.…”
Section: Related Work On Uedmentioning
confidence: 99%
“…The next set of approaches related to PAIRED [Dennis et al, 2020;Jiang et al, 2021a] rely on regret, which is defined approximately as the difference between the maximum and the mean return of students' policy, to generate new levels:…”
Section: Approaches For Solving Uedmentioning
confidence: 99%
See 1 more Smart Citation