The Winograd Schema Challenge (WSC) (Levesque, Davis, and Morgenstern 2011), a benchmark for commonsense reasoning, is a set of 273 expert-crafted pronoun resolution problems originally designed to be unsolvable for statistical models that rely on selectional preferences or word associations. However, recent advances in neural language models have already reached around 90% accuracy on variants of WSC. This raises an important question whether these models have truly acquired robust commonsense capabilities or whether they rely on spurious biases in the datasets that lead to an overestimation of the true capabilities of machine commonsense.To investigate this question, we introduce WinoGrande, a large-scale dataset of 44k problems, inspired by the original WSC design, but adjusted to improve both the scale and the hardness of the dataset. The key steps of the dataset construction consist of (1) a carefully designed crowdsourcing procedure, followed by (2) systematic bias reduction using a novel AfLite algorithm that generalizes human-detectable word associations to machine-detectable embedding associations. The best state-of-the-art methods on WinoGrande achieve 59.4 – 79.1%, which are ∼15-35% (absolute) below human performance of 94.0%, depending on the amount of the training data allowed (2% – 100% respectively).Furthermore, we establish new state-of-the-art results on five related benchmarks — WSC (→ 90.1%), DPR (→ 93.1%), COPA(→ 90.6%), KnowRef (→ 85.6%), and Winogender (→ 97.1%). These results have dual implications: on one hand, they demonstrate the effectiveness of WinoGrande when used as a resource for transfer learning. On the other hand, they raise a concern that we are likely to be overestimating the true capabilities of machine commonsense across all these benchmarks. We emphasize the importance of algorithmic bias reduction in existing and future benchmarks to mitigate such overestimation.
How do we know which grammatical error correction (GEC) system is best? A number of metrics have been proposed over the years, each motivated by weaknesses of previous metrics; however, the metrics themselves have not been compared to an empirical gold standard grounded in human judgments. We conducted the first human evaluation of GEC system outputs, and show that the rankings produced by metrics such as MaxMatch and I-measure do not correlate well with this ground truth. As a step towards better metrics, we also propose GLEU, a simple variant of BLEU, modified to account for both the source and the reference, and show that it hews much more closely to human judgments.
We present a new parallel corpus, JHU FLuency-Extended GUG corpus (JFLEG) for developing and evaluating grammatical error correction (GEC). Unlike other corpora, it represents a broad range of language proficiency levels and uses holistic fluency edits to not only correct grammatical errors but also make the original text more native sounding. We describe the types of corrections made and benchmark four leading GEC systems on this corpus, identifying specific areas in which they do well and how they can improve. JFLEG fulfills the need for a new gold standard to properly assess the current state of GEC.
Marine bacteria that kill the noxious red tide flagellate Chattonella antiqua (Raphidophyceae) were screened and isolated from northern Hiroshima Bay, the Seto Inland Sea, Japan in 1991. Four strains (S, K, D, R) of Alteromonas spp. were selected and examined on characteristics of algicidal activities. Strains S and R showed wide algicidal range killing all cells of the 3 raphidophycean flagellates, 2 dia toms, and one dinoflagellate examined, in co-culture. Algicidal activities of the strains K and D depend on prey phytoplahkton species. Bacterial culture filtrate experiment shows that the bacterial strains K and D give lethal effects on C. antiqua by means of extracellular products, and the strains S and R not by such substances but by predation. If one or two bacterial cells were inoculated into C. antiqua cul ture, all of the host cells were killed by the 4 strains of algicidal bacteria within 7 days. All of the 4 bac terial strains could proliferate in filter-sterilized seawater, indicating their ubiquitous existence in the coastal sea. We suggest that the algicidal activity by bacteria may be a significant factor influencing the population dynamics of phytoplankton, and potentially might account for rapid termination of red tides in the coastal sea.
SUMMARYTriggered by the explosion of mobile traffic, 5G (5th Generation) cellular network requires evolution to increase the system rate 1000 times higher than the current systems in 10 years. Motivated by this common problem, there are several studies to integrate mm-wave access into current cellular networks as multi-band heterogeneous networks to exploit the ultra-wideband aspect of the mm-wave band. The authors of this paper have proposed comprehensive architecture of cellular networks with mmwave access, where mm-wave small cell basestations and a conventional macro basestation are connected to Centralized-RAN (C-RAN) to effectively operate the system by enabling power efficient seamless handover as well as centralized resource control including dynamic cell structuring to match the limited coverage of mm-wave access with high traffic user locations via user-plane/control-plane splitting. In this paper, to prove the effectiveness of the proposed 5G cellular networks with mm-wave access, system level simulation is conducted by introducing an expected future traffic model, a measurement based mm-wave propagation model, and a centralized cell association algorithm by exploiting the C-RAN architecture. The numerical results show the effectiveness of the proposed network to realize 1000 times higher system rate than the current network in 10 years which is not achieved by the small cells using commonly considered 3.5 GHz band. Furthermore, the paper also gives latest status of mm-wave devices and regulations to show the feasibility of using mm-wave in the 5G systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.