This paper proposes a Hidden Markov Model (HMM) and an HMM-based chunk tagger, from which a named entity (NE) recognition (NER) system is built to recognize and classify names, times and numerical quantities. Through the HMM, our system is able to apply and integrate four types of internal and external evidences: 1) simple deterministic internal feature of the words, such as capitalization and digitalization; 2) internal semantic feature of important triggers; 3) internal gazetteer feature; 4) external macro context feature. In this way, the NER problem can be resolved effectively. Evaluation of our system on MUC-6 and MUC-7 English NE tasks achieves F-measures of 96.6% and 94.1% respectively. It shows that the performance is significantly better than reported by any other machine-learning system. Moreover, the performance is even consistently better than those based on handcrafted rules.
The world's coastal areas are increasingly at risk of coastal flooding due to sea-level rise (SLR). We present a novel global dataset of extreme sea levels, the Coastal Dataset for the Evaluation of Climate Impact (CoDEC), which can be used to accurately map the impact of climate change on coastal regions around the world. The third generation Global Tide and Surge Model (GTSM), with a coastal resolution of 2.5 km (1.25 km in Europe), was used to simulate extreme sea levels for the ERA5 climate reanalysis from 1979 to 2017, as well as for future climate scenarios from 2040 to 2100. The validation against observed sea levels demonstrated a good performance, and the annual maxima had a mean bias (MB) of-0.04 m, which is 50% lower than the MB of the previous GTSR dataset. By the end of the century (2071-2100), it is projected that the 1 in 10-year water levels will have increased 0.34 m on average for RCP4.5, while some locations may experience increases of up to 0.5 m. The change in return levels is largely driven by SLR, although at some locations changes in storms surges and interaction with tides amplify the impact of SLR with changes up to 0.2 m. By presenting an application of the CoDEC dataset to the city of Copenhagen, we demonstrate how climate impact indicators derived from simulation can contribute to an understanding of climate impact on a local scale. Moreover, the CoDEC output locations are designed to be used as boundary conditions for regional models, and we envisage that they will be used for dynamic downscaling.
Extracting semantic relationships between entities is challenging. This paper investigates the incorporation of diverse lexical, syntactic and semantic knowledge in feature-based relation extraction using SVM. Our study illustrates that the base phrase chunking information is very effective for relation extraction and contributes to most of the performance improvement from syntactic aspect while additional information from full parsing gives limited further enhancement. This suggests that most of useful information in full parse trees for relation extraction is shallow and can be captured by chunking. We also demonstrate how semantic information such as WordNet and Name List, can be used in feature-based relation extraction to further improve the performance. Evaluation on the ACE corpus shows that effective incorporation of diverse features enables our system outperform previously best-reported systems on the 24 ACE relation subtypes and significantly outperforms tree kernel-based systems by over 20 in F-measure on the 5 ACE relation types.
This paper proposes a novel composite kernel for relation extraction. The composite kernel consists of two individual kernels: an entity kernel that allows for entity-related features and a convolution parse tree kernel that models syntactic information of relation examples. The motivation of our method is to fully utilize the nice properties of kernel methods to explore diverse knowledge for relation extraction. Our study illustrates that the composite kernel can effectively capture both flat and structured features without the need for extensive feature engineering, and can also easily scale to include more features. Evaluation on the ACE corpus shows that our method outperforms the previous best-reported methods and significantly outperforms previous two dependency tree kernels for relation extraction.
In this paper, we propose a multi-criteriabased active learning approach and effectively apply it to named entity recognition. Active learning targets to minimize the human annotation efforts by selecting examples for labeling. To maximize the contribution of the selected e xamples, we consider the multiple criteria: informativeness, representativeness and diversity and propose measures to quantify them. More comprehensively, we incorporate all the criteria using two selection strategies, both of which result in less labeling cost than single-criterion-based method. The results of the named entity recognition in both MUC-6 and GENIA show that the labeling cost can be reduced by at least 80% without degrading the performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.