Providing learners of a foreign language with meaningful opportunities for interactions, specifically with native speakers, is especially challenging for instructors. One way to overcome this obstacle is through video‐synchronous computer‐mediated communication tools such as Skype software. This study reports quantitative and qualitative data from a Skype partner program that emphasized meaningful communication between American students learning Japanese in the United States and Japanese students learning English in Japan. Analysis of pre‐ and posttests showed significant improvement in the listening and speaking abilities of the Japanese participants and in the speaking abilities of the American participants.
Phrasal verbs are important for EFL and ESL education because of their high frequency, but can be difficult for learners because of their number and polysemy. While there are a number of studies on phrasal verbs, the widening focus of such studies has left a gap between theory and practical instruction. This study improves upon previous studies related to teaching phrasal verbs through cognitive linguistics by combining the theory of event conflation with corpusbased research to create a list of phrasal verb particles and meanings that is concise and yet comprehensive enough to account for approximately 95% of common phrasal verb meanings. It also reports the results of an experiment in which learners taught with this particle list improved more on pre-/post-tests of phrasal verbs than learners that studied a list of the most common phrasal verbs as whole entities (p<0.001, d=1.34). Quantitative and qualitative data presented in this study also indicate that learners taught with the particle list improved their ability to conjecture the meanings of novel phrasal verbs more effectively than learners who studied common phrasal verbs as whole units.
Log-linear models are arguably the most successful class of graphical models for large-scale applications because of their simplicity and tractability. Learning and inference with these models require calculating the partition function, which is a major bottleneck and intractable for large state spaces. Importance Sampling (IS) and MCMC-based approaches are lucrative. However, the condition of having a "good" proposal distribution is often not satisfied in practice.In this paper, we add a new dimension to efficient estimation via sampling. We propose a new sampling scheme and an unbiased estimator that estimates the partition function accurately in sub-linear time. Our samples are generated in near-constant time using locality sensitive hashing (LSH), and so are correlated and unnormalized. We demonstrate the effectiveness of our proposed approach by comparing the accuracy and speed of estimating the partition function against other state-of-the-art estimation techniques including IS and the efficient variant of Gumbel-Max sampling. With our efficient sampling scheme, we accurately train real-world language models using only 1-2% of computations.
Computer‐mediated, communication‐based foreign language learning programs are showing great promise. Among these, video‐synchronous mediated communication seems to offer an effective way to provide speaking practice, although research findings have been inconclusive. Furthermore, among studies that have documented the effectiveness of video‐mediated communication, it is not clear why some learners improve more than others. This study reports data from three separate learner groups who engaged in video‐synchronous mediated communication with native speakers. Specifically, the study investigated its effectiveness on learners’ oral fluency and explored the impact of enjoyment, target language speaking time, and instructional level on improvement. The data suggest that participation did not necessarily guarantee greater improvement when compared with students in a control group; that instructional level is associated with improvement; that allocated time is associated with a decrease in pausing; and that students’ reasons for enjoying the program, rather than their overall enjoyment, are related to improvement.
Question and answer generation is a data augmentation method that aims to improve question answering (QA) models given the limited amount of human labeled data. However, a considerable gap remains between synthetic and human-generated question-answer pairs. This work aims to narrow this gap by taking advantage of large language models and explores several factors such as model size, quality of pretrained models, scale of data synthesized, and algorithmic choices. On the SQUAD1.1 question answering task, we achieve higher accuracy using solely synthetic questions and answers than when using the SQUAD1.1 training set questions alone. Removing access to real Wikipedia data, we synthesize questions and answers from a synthetic corpus generated by an 8.3 billion parameter GPT-2 model. With no access to human supervision and only access to other models, we are able to train state of the art question answering networks on entirely model-generated data that achieve 88.4 Exact Match (EM) and 93.9 F1 score on the SQUAD1.1 dev set. We further apply our methodology to SQUAD2.0 and show a 2.8 absolute gain on EM score compared to prior work using synthetic data.Consistent with prior work (Alberti et al., 2019a;Dong et al., 2019), we use a 3-step modeling pipeline consisting of unconditional answer extraction from text, question generation, and question filtration. Our approach for training question generators on labeled data uses pretrained GPT-2 decoder models and a next-token-prediction language modeling objective, trained using a concatenation of context, answer, and question tokens. As demonstrated in sections 5.1 and 6.1, pretraining large generative transformer models up to 8.3B parameters improves the quality of generated questions. Additionally, we propose an overgenerate and filter approach to further improve question filtration. The quality of questions produced by this pipeline can be assessed quantitatively by finetuning QA models and evaluating results on the SQUAD dataset.We demonstrate generated questions to be comparable to supervised training with real data. For answerable SQUAD1.1
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.