2022
DOI: 10.48550/arxiv.2212.00193
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Distilling Multi-Step Reasoning Capabilities of Large Language Models into Smaller Models via Semantic Decompositions

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 0 publications
0
9
0
Order By: Relevance
“…Its computational efficiency makes it practical for real-world situations where cost and speed are important considerations. For instance, our method has the potential to utilize the capabilities of LLMs through knowledge distillation, serving as a student model to improve performance while maintaining efficiency, like (Ho et al, 2022;Magister et al, 2022;Shridhar et al, 2022;Liang et al, 2023a). Therefore, we believe this work has its value to the research field.…”
Section: Discussion On Large Language Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…Its computational efficiency makes it practical for real-world situations where cost and speed are important considerations. For instance, our method has the potential to utilize the capabilities of LLMs through knowledge distillation, serving as a student model to improve performance while maintaining efficiency, like (Ho et al, 2022;Magister et al, 2022;Shridhar et al, 2022;Liang et al, 2023a). Therefore, we believe this work has its value to the research field.…”
Section: Discussion On Large Language Modelsmentioning
confidence: 99%
“…In fact, our solver even outperforms the original chain-ofthought approach on the MAWPS dataset. Additionally, our solver has a more transparent structure and is amenable to further fine-tuning and has the potential to combine with LLMs in a knowledge distillation way such as (Ho et al, 2022;Magister et al, 2022;Shridhar et al, 2022;Liang et al, 2023a). Given these strengths, we believe our work represents a meaningful contribution to the field, providing a powerful Seq2Tree model for MWP solving tasks and advancing the research community's understanding of these problems.…”
Section: Large Language Models In Math Wordmentioning
confidence: 99%
“…(Magister et al, 2022) assesses the efficacy of chain-of-thought explanations in the training of a small model across three disparate tasks, namely arithmetic reasoning, commonsense reasoning, and symbolic reasoning. Furthermore, (Shridhar et al, 2022b) presents Decompositional Distillation, an approach that segments the problem into subproblems to enhance smaller models' performance.…”
Section: Large Language Models For Knowledge Distillation and Data Ge...mentioning
confidence: 99%
“…student models). The majority of prior research has emphasized using the "explanation" component of the CoT approach as the distilled knowledge (Ho et al, 2022;Li et al, 2022a;Shridhar et al, 2022b;Magister et al, 2022). Nonetheless, these methodologies exhibit certain limitations.…”
mentioning
confidence: 99%
“…By doing so, knowledge of the teacher model is effectively distilled into a much smaller student model, allowing a similar level of performance as the teacher at a lower computational cost. Shridhar et al (2022) distill a GPT3 (6B) model into a GPT-2 model for a Chain-of-thought (CoT) reasoning task. Liang et al (2021) propose MixKD to encourage the student model to mimic the teacher's behavior on not only the available training examples but also on interpolated examples.…”
Section: Knowledge Distillationmentioning
confidence: 99%