OpenIE6: Iterative Grid Labeling and Coordination Analysis for Open Information Extraction

Kolluru, Keshav; Adlakha, Vaibhav; Aggarwal, Samarth; Mausam, Mausam; Chakrabarti, Soumen

doi:10.18653/v1/2020.emnlp-main.306

Cited by 53 publications

(80 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…While the data imported from various structured sources provides the most reliable part of the graph, it is enriched using large-scale relation extraction from free-text sources such as scientific literature from PubMed 1 . The NLP aspects of BIKG is based on a series of pipelines, ranging from simple entity co-occurrence and traditional rule based dependency parsing, to state-of-the-art relationship classification with RBERT [49] and open information extraction with OpenIE6 [26] neural information extraction system. In terms of quantity, this NLP-extracted data constitutes the largest component of the graph, providing around 80% of graph edges.…”

Section: Nlp For Graph Populationmentioning

confidence: 99%

Biological Insights Knowledge Graph: an integrated knowledge graph to support drug development

Geleta

Nikolov

Edwards

et al. 2021

Preprint

View full text Add to dashboard Cite

The use of knowledge graphs as a data source for machine learning methods to solve complex problems in life sciences has rapidly become popular in recent years. Our Biological Insights Knowledge Graph (BIKG) combines relevant data for drug development from public as well as internal data sources to provide insights for a range of tasks: from identifying new targets to repurposing existing drugs. Besides the common requirements to organisational knowledge graphs such as being able to capture the domain precisely and give the users the ability to search and query the data, the focus on handling multiple use cases and supporting use case-specific machine learning models presents additional challenges: the data models must also be streamlined for the performance of downstream tasks; graph content must be easily customisable for different use cases; different projections of the graph content are required to support a wider range of different consumption modes. In this paper we describe our main design choices in implementation of the BIKG graph and discuss different aspects of its life cycle: from graph construction to exploitation.

show abstract

Section: Nlp For Graph Populationmentioning

confidence: 99%

Biological Insights Knowledge Graph: an integrated knowledge graph to support drug development

Geleta

Nikolov

Edwards

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…To evaluate our system, we first measure the performance of our triple extractor against two state-of-the-art systems, OpenIE6 [55] and IMoJIE [56], on two standard benchmark data sets. Next, we use the PubMed abstracts dataset to demonstrate the qualitative advantages of our enhancements, in comparison to these systems and to show that our approach generalizes well for a diverse set of datasets.…”

Section: Methodsmentioning

confidence: 99%

LILLIE: Information extraction and database integration using linguistics and learning-based algorithms

Smith

Papadopoulos

Braschler

et al. 2022

Information Systems

View full text Add to dashboard Cite

Querying both structured and unstructured data via a single common query interface such as SQL or natural language has been a long standing research goal. Moreover, as methods for extracting information from unstructured data become ever more powerful, the desire to integrate the output of such extraction processes with ''clean'', structured data grows. We are convinced that for successful integration into databases, such extracted information in the form of ''triples'' needs to be both (1) of high quality and ( 2) have the necessary generality to link up with varying forms of structured data. It is the combination of both these aspects, which heretofore have been usually treated in isolation, where our approach breaks new ground.The cornerstone of our work is a novel, generic method for extracting open information triples from unstructured text, using a combination of linguistics and learning-based extraction methods, thus uniquely balancing both precision and recall. Our system called LILLIE (LInked Linguistics and Learning-Based Information Extractor) uses dependency tree modification rules to refine triples from a high-recall learning-based engine, and combines them with syntactic triples from a high-precision engine to increase effectiveness. In addition, our system features several augmentations, which modify the generality and the degree of granularity of the output triples. Even though our focus is on addressing both quality and generality simultaneously, our new method substantially outperforms current state-of-the-art systems on the two widely-used CaRB and Re-OIE16 benchmark sets for information extraction.We have made our code publicly available 1 to facilitate further research.

show abstract

“…Many open information extraction (OIE) systems, e.g., Stanford OpenIE (Angeli et al, 2015), OLLIE (Schmitz et al, 2012), Reverb (Fader et al, 2011), and their descendant Open IE4 leverage carefully-designed linguistic patterns (e.g., based on dependencies and POS tags) to extract triples from textual corpora without using additional training sets. Recently, supervised OIE systems (Stanovsky et al, 2018;Ro et al, 2020;Kolluru et al, 2020) formulate the OIE as a sequence generation problem using neural networks trained on additional training sets. Similar to our work, Wang et al (2020) use the parameters of LMs to extract triples, with the main difference that DEEPEX not only improves the recall of the beam search, but also uses a pre-trained ranking model to enhance the zero-shot capability.…”

Section: Related Workmentioning

confidence: 99%

Zero-Shot Information Extraction as a Unified Text-to-Triple Translation

Wang¹,

Liu²,

Chen³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

We cast a suite of information extraction tasks into a text-to-triple translation framework. Instead of solving each task relying on taskspecific datasets and models, we formalize the task as a translation between task-specific input text and output triples. By taking the taskspecific input, we enable a task-agnostic translation by leveraging the latent knowledge that a pre-trained language model has about the task. We further demonstrate that a simple pretraining task of predicting which relational information corresponds to which input text is an effective way to produce task-specific outputs. This enables the zero-shot transfer of our framework to downstream tasks. We study the zero-shot performance of this framework on open information extraction (OIE2016, NYT, WEB, PENN), relation classification (FewRel and TACRED), and factual probe (Google-RE and T-REx). The model transfers non-trivially to most tasks and is often competitive with a fully supervised method without the need for any task-specific training. For instance, we significantly outperform the F1 score of the supervised open information extraction without needing to use its training set. 1

show abstract

OpenIE6: Iterative Grid Labeling and Coordination Analysis for Open Information Extraction

Cited by 53 publications

References 31 publications

Biological Insights Knowledge Graph: an integrated knowledge graph to support drug development

Biological Insights Knowledge Graph: an integrated knowledge graph to support drug development

LILLIE: Information extraction and database integration using linguistics and learning-based algorithms

Zero-Shot Information Extraction as a Unified Text-to-Triple Translation

Contact Info

Product

Resources

About