Cardinal Virtues: Extracting Relation Cardinalities from Text

Mirza, Paramita; Razniewski, Simon; Darari, Fariz; Weikum, Gerhard

doi:10.18653/v1/p17-2055

Cited by 12 publications

(13 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In predicting counting quantifiers through recognizing cardinals in text, CINEX-CRF achieves 55-85% precision. This is a considerable improvement (up to 48.9 percentage points) compared to the baseline [22].Although the baseline yields a comparable coverage, its low precision suggests that it has difficulties to pick up correct context and produces some matches only by chance.…”

Section: Discussionmentioning

confidence: 89%

“…Thus, we define relations in our experiments as pairs of a Wikidata subject type/class and a Wikidata property. We focus on five diverse relations (listed in Table 1 under the Relation column) using the four Wikidata properties already used in [22], but using two specific Wikidata classes for the overloaded has part property, i.e., series of creative works and musical ensemble. We use four sets of entities for training and evaluation: For the manual test set we manually annotated mentions in text that correspond to counting quantifiers, and established the correct object count from Wikipedia.…”

Section: Methodsmentioning

confidence: 99%

“…While NELL, for instance, knows 13 relations about the number of casualties and injuries in disasters, they all contain only seed facts and no learned facts. In [22], which we use as baseline for our experiments, we have proposed a singlestage process for identifying numbers that express relation counts. Yet, we there only consider explicit cardinals and do not tackle training data incompleteness nor compositionality, thus achieving only moderate precision and coverage.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Enriching Knowledge Bases with Counting Quantifiers

Mirza

Razniewski

Darari

et al. 2018

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Information extraction traditionally focuses on extracting relations between identifiable entities, such as Monterey, locatedIn, California . Yet, texts often also contain Counting information, stating that a subject is in a specific relation with a number of objects, without mentioning the objects themselves, for example, "California is divided into 58 counties". Such counting quantifiers can help in a variety of tasks such as query answering or knowledge base curation, but are neglected by prior work. This paper develops the first full-fledged system for extracting counting information from text, called CINEX. We employ distant supervision using fact counts from a knowledge base as training seeds, and develop novel techniques for dealing with several challenges: (i) non-maximal training seeds due to the incompleteness of knowledge bases, (ii) sparse and skewed observations in text sources, and (iii) high diversity of linguistic patterns. Experiments with five human-evaluated relations show that CINEX can achieve 60% average precision for extracting counting information. In a large-scale experiment, we demonstrate the potential for knowledge base enrichment by applying CINEX to 2,474 frequent relations in Wikidata. CINEX can assert the existence of 2.5M facts for 110 distinct relations, which is 28% more than the existing Wikidata facts for these relations. arXiv:1807.03656v1 [cs.CL] 10 Jul 2018Second, an important use case is KB curation [8,34]. KBs are notoriously incomplete, contain erroneous triples, and are limited in keeping up with the pace of real-world changes. Counting information helps to identify gaps and inaccuracies. For example, knowing the exact number of counties in California or a lower bound for the number of films directed by Eastwood are important cues to complete and enrich a KB.State-of-the-Art and Challenges. The predominant approach to extracting facts for KB population is distant supervision, using seeds for the SPO triples of interest (e.g., [21,32]). The seeds are usually taken from an initial KB or are manually compiled. Spotting the seeds in a text corpus (e.g., Clint Eastwood, directed and Gran Torino) then allows learning patterns for relations (e.g., "director of" or " someone 's masterpiece"), which in turn lead to observing new fact candidates. This methodology is known as the pattern-relation duality principle [2].Distant supervision is a natural approach for extracting counting information as well: the cardinality of distinct O arguments for a given SP pair, n := |{O | SP O ∈ KB }|, serves as a seed for the counting assertion, S, P, ∃n . However, it is more challenging than traditional SPO-fact extraction and needs to cope with several issues: 1) Non-maximal seeds: Unlike for SPO-fact extraction, the incompleteness of KBs not only leads to a reduction in the number of seeds, but to seeds that systematically underestimate the count of facts that are valid in reality. For example, a KB that knows only a subset of Trump's children, say three out of five, leads to a non-maximal s...

show abstract

Section: Discussionmentioning

confidence: 89%

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Enriching Knowledge Bases with Counting Quantifiers

Mirza

Razniewski

Darari

et al. 2018

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…Galárraga et al [48] investigated various signals, such as popularity, update frequency, and cardinality, that can be used to identify complete parts of a KB via rule-mining techniques. Mirza et al [49,50] developed techniques for relation cardinality extraction from text, which can be leveraged to generate completeness statements in the following way: when the extracted car-dinality of a relation matches with the relation count in a KB, then a completeness statement can be generated. COOL-WD is a collaborative, web-based system for managing and consuming completeness information about Wikidata, which currently stores over 10,000 real completeness statements [51], and is available at http://cool-wd.inf.unibz.it.…”

Section: Creation Of Completeness Informationmentioning

confidence: 99%

Completeness and soundness guarantees for conjunctive SPARQL queries over RDF data sources with completeness statements1

Darari

Nutt

Razniewski

et al. 2020

Self Cite

View full text Add to dashboard Cite

RDF generally follows the open-world assumption: information is incomplete by default. Consequently, SPARQL queries cannot retrieve with certainty complete answers, and even worse, when they involve negation, it is unclear whether they produce sound answers. Nevertheless, there is hope to lift this limitation. On many specific topics (e.g., children of Trump, Apollo 11 crew, EU founders), RDF data sources contain complete information, a fact that can be made explicit through completeness statements. In this work, we leverage completeness statements over RDF data sources to provide guarantees of completeness and soundness for conjunctive SPARQL queries. We develop a technique to check whether query completeness can be guaranteed by taking into account also the specifics of the queried graph, and analyze the complexity of such checking. For queries with negation, we approach the problem of query soundness checking, and distinguish between answer soundness (i.e., is an answer of a query sound?) and pattern soundness (i.e., is a query as a whole sound?). We provide a formalization and characterize the soundness problem via a reduction to the completeness problem. We further develop heuristic techniques for completeness checking, and conduct experimental evaluations based on Wikidata, a prominent, real-world knowledge base, to demonstrate the feasibility of our approach.

show abstract

“…ac.id/. Future directions of this work include the incorporation of supervised (or semi-supervised) approaches for specific steps of KOI such as the extraction of numeral information (Mirza et al, 2017), as well as the investigation of applying our approach to other domains such as disease outbreaks and natural disasters.…”

Section: Resultsmentioning

confidence: 99%

KOI at SemEval-2018 Task 5: Building Knowledge Graph of Incidents

Mirza

Darari²,

Mahendra³

2018

Proceedings of the 12th International Workshop on Semantic Evaluation

Self Cite

View full text Add to dashboard Cite

We present KOI (Knowledge of Incidents), a system that given news articles as input, builds a knowledge graph (KOI-KG) of incidental events. KOI-KG can then be used to efficiently answer questions such as "How many killing incidents happened in 2017 that involve Sean?" The required steps in building the KG include: (i) document preprocessing involving word sense disambiguation, named-entity recognition, temporal expression recognition and normalization, and semantic role labeling; (ii) incidental event extraction and coreference resolution via document clustering; and (iii) KG construction and population.

show abstract

Cardinal Virtues: Extracting Relation Cardinalities from Text

Cited by 12 publications

References 12 publications

Enriching Knowledge Bases with Counting Quantifiers

Enriching Knowledge Bases with Counting Quantifiers

Completeness and soundness guarantees for conjunctive SPARQL queries over RDF data sources with completeness statements1

KOI at SemEval-2018 Task 5: Building Knowledge Graph of Incidents

Contact Info

Product

Resources

About