It is often claimed that Named Entity recognition systems need extensive gazetteers-lists of names of people, organisations, locations, and other named entities. Indeed, the compilation of such gazetteers is sometimes mentioned as a bottleneck in the design of Named Entity recognition systems. We report on a Named Entity recognition system which combines rule-based grammars with statistical (maximum entropy) models. We report on the system's performance with gazetteers of different types and different sizes, using test material from the MUC-7 competition. We show that, for the text type and task of this competition, it is sufficient to use relatively small gazetteers of well-known names, rather than large gazetteers of low-frequency names. We conclude with observations about the domain independence of the competition and of our experiments.
Although recent named entity (NE) annotation efforts involve the markup of nested entities, there has been limited focus on recognising such nested structures. This paper introduces and compares three techniques for modelling and recognising nested entities by means of a conventional sequence tagger. The methods are tested and evaluated on two biomedical data sets that contain entity nesting. All methods yield an improvement over the baseline tagger that is only trained on flat annotation.
We report on two JISC-funded projects that aimed to enrich the metadata of digitized historical collections with georeferences and other information automatically computed using geoparsing and related information extraction technologies. Understanding location is a critical part of any historical research, and the nature of the collections makes them an interesting case study for testing automated methodologies for extracting content. The two projects (GeoDigRef and Embedding GeoCrossWalk) have looked at how automatic georeferencing of resources might be useful in developing improved geographical search capacities across collections. In this paper, we describe the work that was undertaken to configure the geoparser for the collections as well as the evaluations that were performed.
Although text mining shows considerable promise as a tool for supporting the curation of biomedical text, there is little concrete evidence as to its effectiveness. We report on three experiments measuring the extent to which curation can be speeded up with assistance from Natural Language Processing (NLP), together with subjective feedback from curators on the usability of a curation tool that integrates NLP hypotheses for protein-protein interactions (PPIs). In our curation scenario, we found that a maximum speed-up of 1/3 in curation time can be expected if NLP output is perfectly accurate. The preference of one curator for consistent NLP output and output with high recall needs to be confirmed in a larger study with several curators.
Abstract. We describe research carried out as part of a text summarisation project for the legal domain for which we use a new XML corpus of judgments of the UK House of Lords. These judgments represent a particularly important part of public discourse due to the role that precedents play in English law. We present experimental results using a range of features and machine learning techniques for the task of predicting the rhetorical status of sentences and for the task of selecting the most summary-worthy sentences from a document. Results for these components are encouraging as they achieve state-of-the-art accuracy using robust, automatically generated cue phrase information. Sample output from the system illustrates the potential of summarisation technology for legal information management systems and highlights the utility of our rhetorical annotation scheme as a model of legal discourse, which provides a clear means for structuring summaries and tailoring them to different types of users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.