A robust evaluation metric has a profound impact on the development of text generation systems. A desirable metric compares system output against references based on their semantics rather than surface forms. In this paper we investigate strategies to encode system and reference texts to devise a metric that shows a high correlation with human judgment of text quality. We validate our new metric, namely MoverScore, on a number of text generation tasks including summarization, machine translation, image captioning, and data-to-text generation, where the outputs are produced by a variety of neural and non-neural systems. Our findings suggest that metrics combining contextualized representations with a distance measure perform the best. Such metrics also demonstrate strong generalization capability across tasks. For ease-of-use we make our metrics available as web service. 1
We propose a method to perform automatic document summarisation without using reference summaries. Instead, our method interactively learns from users' preferences. The merit of preference-based interactive summarisation is that preferences are easier for users to provide than reference summaries. Existing preference-based interactive learning methods suffer from high sample complexity, i.e. they need to interact with the oracle for many rounds in order to converge. In this work, we propose a new objective function, which enables us to leverage active learning, preference learning and reinforcement learning techniques in order to reduce the sample complexity. Both simulation and real-user experiments suggest that our method significantly advances the state of the art. Our source code is freely available at https
Reinforcement Learning (RL) based document summarisation systems yield state-of-the-art performance in terms of ROUGE scores, because they directly use ROUGE as the rewards during training. However, summaries with high ROUGE scores often receive low human judgement. To find a better reward function that can guide RL to generate human-appealing summaries, we learn a reward function from human ratings on 2,500 summaries. Our reward function only takes the document and system summary as input. Hence, once trained, it can be used to train RL-based summarisation systems without using any reference summaries. We show that our learned rewards have significantly higher correlation with human ratings than previous approaches. Human evaluation experiments show that, compared to the state-of-the-art supervised-learning systems and ROUGE-as-rewards RL summarisation systems, the RL systems using our learned rewards during training generate summaries with higher human ratings. The learned reward function and our source code are available at https://github.com/yg211/ summary-reward-no-reference.
Objective:The authors sought to compare GABA levels in treated and untreated patients with psychosis with levels in their unaffected siblings and healthy control subjects, and to assess the effects of antipsychotic medications on GABA levels.Method: GABA+ levels (i.e., including signal from unrelated proteins or macromolecules) referenced to creatine or water were studied with J-edited proton spectroscopy in the dorsal anterior cingulate cortex of 289 individuals: 184 healthy control subjects, 83 treated patients with psychosis, 25 untreated patients, 31 unaffected siblings, and 17 patients studied both while off all medications and while on a single antipsychotic.Results: GABA+ levels in the dorsal anterior cingulate did not differ between untreated patients and healthy controls. For treated patients, levels were modestly lower for GABA+/ creatine but did not differ for GABA+/water compared with healthy controls. For both GABA+ measures, unaffected siblings had significantly lower levels compared with controls. GABA+/creatine showed a modest degree of familiality (intraclass correlation=0.36). Antipsychotic dosage was negatively correlated with GABA+ levels, but the on-off medication studies indicated no difference in GABA+ levels on antipsychotics compared with off antipsychotics.Conclusions: GABA+/creatine in the dorsal anterior cingulate may constitute an intermediate phenotype with low effect size for psychosis, but GABA+/water measures do not fully support this conclusion. Low GABA+ levels in unaffected siblings could suggest a genetic association, but the failure to find consistent evidence of this phenotype in the patients themselves weakens genetic inference on risk for psychosis. Replication in independent samples of siblings is warranted to confirm the potential genetic risk association.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.