2022
DOI: 10.48550/arxiv.2203.13224
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(12 citation statements)
references
References 0 publications
0
12
0
Order By: Relevance
“…While proposed algorithms work for any black-box optimization problems, they are most suited for high-dimensional optimization problems where the evaluation of a single updated parameter set is quick, but the sheer number of dimensions makes the search for a valid update excessively slow for a single worker, with a notable application being the neuroevolution of large ANNs. We foresee that such a setting is of particular relevance to multipurpose super neural networks, such as PathNet [16], as well as for conversational agents derived from LLMs [20,51]. 6 As we mention in the introduction, we use Genetic Algorithm only to designate EAs that has "chromosome" and "cross-over" phases, consistently with the nomenclature introduced in [24] Finally, more accessible and reliable byzantine-resilient machine learning allows a variety of entities to poll together their computational resources to train models they could not have trained individually, which has the potential of democratizing state-of-theart ML research in a non-differentiable setting.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…While proposed algorithms work for any black-box optimization problems, they are most suited for high-dimensional optimization problems where the evaluation of a single updated parameter set is quick, but the sheer number of dimensions makes the search for a valid update excessively slow for a single worker, with a notable application being the neuroevolution of large ANNs. We foresee that such a setting is of particular relevance to multipurpose super neural networks, such as PathNet [16], as well as for conversational agents derived from LLMs [20,51]. 6 As we mention in the introduction, we use Genetic Algorithm only to designate EAs that has "chromosome" and "cross-over" phases, consistently with the nomenclature introduced in [24] Finally, more accessible and reliable byzantine-resilient machine learning allows a variety of entities to poll together their computational resources to train models they could not have trained individually, which has the potential of democratizing state-of-theart ML research in a non-differentiable setting.…”
Section: Discussionmentioning
confidence: 99%
“…There are theoretical reasons why approaches that reduce learning in a non-differentiable setting to a differentiable one would underperform compared to gradient-free black-box optimization approaches [39,46]. This is particularly relevant now, given that the latest development in the LLM field is conversation agents, which rely on optimization based on discrete feedback to align their behavior on user expectations and non-differentiable layers of hard attention to solve the issues with rule-following that plague them [20,43,51].…”
Section: Gradient-free Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…To this end, knowledgeaware works ground dialogue response on external knowledge (Yu et al, 2020). The knowledge can be gathered from text-based encyclopedia (Lin et al, 2020a;Zhao et al, 2020), commonsense knowledge , search engines (Shuster et al, 2022), and many others (Moghe et al, 2020;. Although such works have achieved promising results, they often only focus on high-resource English or Chinese (Kann et al, 2022).…”
Section: Related Workmentioning
confidence: 99%
“…We took R2C2 (22) as our base model-a 2.7 billion-parameter Transformer-based (23) encoder-decoder model pretrained on text from the internet by using a BART denoising objective (24). The base pretrained model was then further trained on WebDiplomacy (Methods, Data) through standard maximum likelihood estimation.…”
Section: Imitation Dialogue Modelmentioning
confidence: 99%