Contract language is repetitive (Anderson and Manns, 2017), but so is all language (Zipf, 1949). In this paper, we measure the extent to which contract language in English is repetitive compared with the language of other English language corpora. Contracts have much smaller vocabulary sizes compared with similarly sized non-contract corpora across multiple contract types, contain 1/5 th as many hapax legomena, pattern differently on a loglog plot, use fewer pronouns, and contain sentences that are about 20% more similar to one another than in other corpora. These suggest that the study of contracts in natural language processing controls for some linguistic phenomena and allows for more in depth study of others.
The unsupervised extraction of narrative schemas-sets of events with associated argument chains-has been explored and evaluated from many angles (Chambers and Jurafsky, 2009; Jans et al. 2012; Balasubramanian et al., 2013; Pichotta and Mooney 2014). While the extraction process and evaluation of the products has been well-researched and debated, little insight has been garnered on properties of narrative schemas themselves. We examine how well extracted narrative schemas align with existing document categories using a novel procedure for retrieving candidate category alignments. This was tested against alternative baseline alignment procedures that disregard some of the complex information the schemas contain. We find that a classifier built with all available information in a schema is more precise than a classifier built with simpler subcomponents. Coreference information plays an crucial role in schematic knowledge.
Genre and domain are well known covariates of both manual and automatic annotation quality. Comparatively less is known about the effect of sentence types, such as imperatives, questions or fragments, and how they interact with text type effects. Using mixed effects models, we evaluate the relative influence of genre and sentence types on automatic and manual annotation quality for three related tasks in English data: POS tagging, dependency parsing and coreference resolution. For the latter task, we also develop a new metric for the evaluation of individual regions of coreference annotation. Our results show that while there are substantial differences between manual and automatic annotation in each task, sentence type is generally more important than genre in predicting errors within our data.
In this paper, we investigate the distribution of narrative schemas (Chambers and Jurafsky, 2009) throughout different document categories and how the structure of narrative schemas is conditioned by document category, the converse of the relationship explored in Simonson and Davis (2015). We evaluate cross-category narrative differences by assessing the predictability of verbs in each category and the salience of arguments to events that narrative schemas highlight. For the former, we use the narrative cloze task employed in previous work on schemas. For the latter, we introduce a task that employs narrative schemas called narrative argument salience through entities annotated, or NASTEA. We compare the schemas induced from the entire corpus to those from the subcorpora for each topic using these two types of evaluation. Results of each evaluation vary by each topical subcorpus, in some cases showing improvement, but the NASTEA task additionally reveals that some the documents within some topics are significantly more rigid in their narrative structure, instantiating a limited number of schemas in a highly predictable fashion.
This paper presents a technique for the identification of participant slots in English language contracts. Taking inspiration from unsupervised slot extraction techniques, the system presented here uses a supervised approach to identify terms used to refer to a genrespecific slot in novel contracts. We evaluate the system in multiple feature configurations to demonstrate that the best performing system in both genres of contracts omits the exact mention form from consideration-even though such mention forms are often the name of the slot under consideration-and is instead based solely on the dependency label and parent; in other words, a more reliable quantification of a party's role in a contract is found in what they do rather than what they are named.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.