2021 19th International Conference on Information Technology Based Higher Education and Training (ITHET) 2021
DOI: 10.1109/ithet50392.2021.9759726
|View full text |Cite
|
Sign up to set email alerts
|

Detecting Micromanagement During Pair Programming

Abstract: In this paper, we investigate the use of data obtained from prompting a large generative language model, ChatGPT, to generate synthetic training data with the aim of augmenting data in low resource scenarios. We show that with appropriate task-specific ChatGPT prompts, we outperform the most popular existing approaches for such data augmentation. Furthermore, we investigate methodologies for evaluating the similarity of the augmented data generated from ChatGPT with the aim of validating and assessing the qual… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
1
1
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 17 publications
0
3
0
Order By: Relevance
“…Gao et al (2023) proposed a novel noise-robust re-weighting framework SunGen to automatically construct high-quality data for zero-shot classification problems. Ubani et al (2023) investigated the use of data obtained from prompting a large generative language model, to generate synthetic training data for few-shot learning. Tang et al (2023) proposed to generate a vast quantity of high-quality synthetic data with labels utilizing ChatGPT and fine-tuning a local model for the downstream task.…”
Section: Black-box Kdmentioning
confidence: 99%
“…Gao et al (2023) proposed a novel noise-robust re-weighting framework SunGen to automatically construct high-quality data for zero-shot classification problems. Ubani et al (2023) investigated the use of data obtained from prompting a large generative language model, to generate synthetic training data for few-shot learning. Tang et al (2023) proposed to generate a vast quantity of high-quality synthetic data with labels utilizing ChatGPT and fine-tuning a local model for the downstream task.…”
Section: Black-box Kdmentioning
confidence: 99%
“…Nascent work also demonstrates that automatically generated annotations for dialog acts are effective for understanding learning. Recent studies in learning analytics employed NLP techniques to analyze collaborative problem-solving, such as identifying collaborative skills through student speech [51], detecting language patterns in pair programming [60], and classifying interactions in collaborative science tasks [21].…”
Section: Introductionmentioning
confidence: 99%
“…Nonetheless, existing approaches are inadequate for compressing LLMs due to their exceptionally high compression ratios. Some prior research (Wang et al, 2022;Dai et al, 2023;Ubani et al, 2023) has suggested utilizing LLMs for data augmentation and knowledge transfer to small-scale models, which allows the latter to demonstrate improved performance on lowresource datasets. However, when tackling more challenging tasks like the SuperGLUE benchmark (Wang et al, 2019a), the limited parameter size of small-scale models becomes a hindrance, preventing them from effectively retaining the knowledge transferred by LLMs.…”
Section: Introductionmentioning
confidence: 99%