LOT: A Story-Centric Benchmark for Evaluating Chinese Long Text Understanding and Generation

Guan, Jian; Feng, Zhuoer; Chen, Yamei; He, Ruilin; Mao, Xiaoxi; Fan, Changjie; Huang, Minlie

doi:10.1162/tacl_a_00469

Cited by 13 publications

(21 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…39,093 examples Synthesis and crowd sourcing. LOT [55] Is a continuation 38,000 examples Expert annotations plausible (Chinese)? of collected texts Located Near [146] Is X located near Y?…”

Section: Collections Of Elementary Mathematical Word Problemsmentioning

confidence: 99%

Benchmarks for Automated Commonsense Reasoning: A Survey

Davis¹

2023

Preprint

View full text Add to dashboard Cite

More than one hundred benchmarks have been developed to test the commonsense knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems. However, these benchmarks are often flawed and many aspects of common sense remain untested. Consequently, we do not currently have any reliable way of measuring to what extent existing AI systems have achieved these abilities.This paper surveys the development and uses of AI commonsense benchmarks. We discuss the nature of common sense; the role of common sense in AI; the goals served by constructing commonsense benchmarks; and desirable features of commonsense benchmarks. We analyze the common flaws in benchmarks, and we argue that it is worthwhile to invest the work needed ensure that benchmark examples are consistently high quality. We survey the various methods of constructing commonsense benchmarks. We enumerate 139 commonsense benchmarks that have been developed: 102 text-based, 18 image-based, 12 video based, and 7 simulated physical environments. We discuss the gaps in the existing benchmarks and aspects of commonsense reasoning that are not addressed in any existing benchmark. We conclude with a number of recommendations for future development of commonsense AI benchmarks.

show abstract

“…39,093 examples Synthesis and crowd sourcing. LOT [55] Is a continuation 38,000 examples Expert annotations plausible (Chinese)? of collected texts Located Near [146] Is X located near Y?…”

Section: Collections Of Elementary Mathematical Word Problemsmentioning

confidence: 99%

Benchmarks for Automated Commonsense Reasoning: A Survey

Davis¹

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…We use STORAL-ZH [18], the Chinese part of STORAL [6], as the dataset for target tasks. Furthermore, LongLM-base [21] is selected as our model, that has been pre-trained on 120G Chinese long novels. For TAPT, the language model is further trained on unlabeled STORAL-ZH to equip itself with task-awareness knowledge.…”

Section: Storymentioning

confidence: 99%

Knowledge-Enriched Moral Understanding upon Continual Pre-training

Jiang¹,

Yue²,

Atkinson³

et al. 2023

Computer Networks &Amp; Communications

View full text Add to dashboard Cite

The aim of moral understanding is to comprehend the abstract concepts that hide in a story by seeing through concrete events and vivid characters. To be specific, the story is highly summarized in one sentence without covering any characters in the original story, which requires the machine to behave more intelligently with the abilities of moral perception and commonsense reasoning. The paradigm of “pre-training + fine-tuning” is generally accepted for applying neural language models. In this paper, we suggest adding an intermediate stage to build the flow of “pre-training + continual pre-training + finetuning”. Continual pre-training refers to further training on task-relevant or domainspecific corpora with the aim of bridging the data distribution gap between pre-training and fine-tuning. Experiments are basing on a new moral story dataset, STORAL-ZH, that composes of 4,209 Chinese story-moral pairs. We collect a moral corpus about Confucius theory to enrich the T5 model with moral knowledge. Furthermore, we leverage a Chinese commonsense knowledge graph to enhance the model with commonsense knowledge. Experimental results demonstrate the effectiveness of our method, compared with several state-of-the-art models including BERT-base, RoBERTa-base and T5-base.

show abstract

“…The story must remain thematically consistent across the complete document as well as keeping creativity. OutGen is an outline-conditioned story generation dataset introduced by Guan et al (2021), which requires generating a coherent long-form story conditioned on an outline of characters and events. The outline is a set of out-of-order phrases.…”

Section: B Datasetsmentioning

confidence: 99%

“…The outline is a set of out-of-order phrases. We use the same data split and evaluation metrics provided by Guan et al (2021) 8 .…”

Section: B Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

SkillNet-NLG: General-Purpose Natural Language Generation with a Sparsely Activated Approach

Liao¹,

Tang²,

Zhang³

et al. 2022

Preprint

View full text Add to dashboard Cite

We present SkillNet-NLG, a sparsely activated approach that handles many natural language generation tasks with one model. Different from traditional dense models that always activate all the parameters, SkillNet-NLG selectively activates relevant parts of the parameters to accomplish a task, where the relevance is controlled by a set of predefined skills. The strength of such model design is that it provides an opportunity to precisely adapt relevant skills to learn new tasks effectively. We evaluate on Chinese natural language generation tasks. Results show that, with only one model file, SkillNet-NLG outperforms previous best performance methods on four of five tasks. SkillNet-NLG performs better than two multi-task learning baselines (a dense model and a Mixture-of-Expert model) and achieves comparable performance to task-specific models. Lastly, SkillNet-NLG surpasses baseline systems when being adapted to new tasks.

show abstract

LOT: A Story-Centric Benchmark for Evaluating Chinese Long Text Understanding and Generation

Cited by 13 publications

References 41 publications

Benchmarks for Automated Commonsense Reasoning: A Survey

Benchmarks for Automated Commonsense Reasoning: A Survey

Knowledge-Enriched Moral Understanding upon Continual Pre-training

SkillNet-NLG: General-Purpose Natural Language Generation with a Sparsely Activated Approach

Contact Info

Product

Resources

About