Background: Due to the nature of scientific methodology, research articles are rich in speculative and tentative statements, also known as hedges. We explore a linguistically motivated approach to the problem of recognizing such language in biomedical research articles. Our approach draws on prior linguistic work as well as existing lexical resources to create a dictionary of hedging cues and extends it by introducing syntactic patterns.
We explore a rule-based methodology for the BioNLP'09 Shared Task on Event Extraction, using dependency parsing as the underlying principle for extracting and characterizing events. We approach the speculation and negation detection task with the same principle. Evaluation results demonstrate the utility of this syntax-based approach and point out some shortcomings that need to be addressed in future work.
The scientific literature is the main source for comprehensive, up-to-date biological knowledge. Automatic extraction of this knowledge facilitates core biological tasks, such as database curation and knowledge discovery. We present here a linguistically inspired, rule-based and syntax-driven methodology for biological event extraction. We rely on a dictionary of trigger words to detect and characterize event expressions and syntactic dependency based heuristics to extract their event arguments. We refine and extend our prior work to recognize speculated and negated events. We show that heuristics based on syntactic dependencies, used to identify event arguments, extend naturally to also identify speculation and negation scope. In the BioNLP'09 Shared Task on Event Extraction, our system placed third in the Core Event Extraction Task (F-score of 0.4462), and first in the Speculation and Negation Task (F-score of 0.4252). Of particular interest is the extraction of complex regulatory events, where it scored second place. Our system significantly outperformed other participating systems in detecting speculation and negation. These results demonstrate the utility of a syntax-driven approach. In this article, we also report on our more recent work on supervised learning of event trigger expressions and discuss event annotation issues, based on our corpus analysis.
BackgroundIn recent years, biological event extraction has emerged as a key natural language processing task, aiming to address the information overload problem in accessing the molecular biology literature. The BioNLP shared task competitions have contributed to this recent interest considerably. The first competition (BioNLP'09) focused on extracting biological events from Medline abstracts from a narrow domain, while the theme of the latest competition (BioNLP-ST'11) was generalization and a wider range of text types, event types, and subject domains were considered. We view event extraction as a building block in larger discourse interpretation and propose a two-phase, linguistically-grounded, rule-based methodology. In the first phase, a general, underspecified semantic interpretation is composed from syntactic dependency relations in a bottom-up manner. The notion of embedding underpins this phase and it is informed by a trigger dictionary and argument identification rules. Coreference resolution is also performed at this step, allowing extraction of inter-sentential relations. The second phase is concerned with constraining the resulting semantic interpretation by shared task specifications. We evaluated our general methodology on core biological event extraction and speculation/negation tasks in three main tracks of BioNLP-ST'11 (GENIA, EPI, and ID).ResultsWe achieved competitive results in GENIA and ID tracks, while our results in the EPI track leave room for improvement. One notable feature of our system is that its performance across abstracts and articles bodies is stable. Coreference resolution results in minor improvement in system performance. Due to our interest in discourse-level elements, such as speculation/negation and coreference, we provide a more detailed analysis of our system performance in these subtasks.ConclusionsThe results demonstrate the viability of a robust, linguistically-oriented methodology, which clearly distinguishes general semantic interpretation from shared task specific aspects, for biological event extraction. Our error analysis pinpoints some shortcomings, which we plan to address in future work within our incremental system development methodology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.