A tool for the manual annotation of cross-document entity and event coreferences that helps annotators to label mention coreference relations in text is essential for the annotation of coreference corpora. To the best of our knowledge, CROss-document Main Events and entities Recognition (CROMER) is the only open-source manual annotation tool available for cross-document entity and event coreferences. However, CROMER lacks multi-language support and extensibility. Moreover, to label cross-document mention coreference relations, CROMER requires the support of another intra-document coreference annotation tool known as Content Annotation Tool, which is now unavailable. To address these problems, we introduce Cross-Document Coreference Annotation Tool (CDCAT), a new multi-language open-source manual annotation tool for cross-document entity and event coreference, which can handle different input/output formats, preprocessing functions, languages, and annotation systems. Using this new tool, annotators can label a reference relation with only two mouse clicks. Best practice analyses reveal that annotators can reach an annotation speed of 0.025 coreference relations per second on a corpus with a coreference density of 0.076 coreference relations per word. As the first multi-language open-source cross-document entity and event coreference annotation tool, CDCAT can theoretically achieve higher annotation efficiency than CROMER.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.