Proceedings of the Fourth ACM International Conference on Web Search and Data Mining 2011
DOI: 10.1145/1935826.1935869
|View full text |Cite
|
Sign up to set email alerts
|

Scalable knowledge harvesting with high precision and high recall

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
117
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 150 publications
(129 citation statements)
references
References 36 publications
0
117
0
Order By: Relevance
“…Semi-supervised bootstrapping approaches such as NELL [5], PROSPERA [15] and BOA [9] start with a set of seed natural language patterns, then employ an iterative approach to both extract information for those patterns and learn new patterns. For NELL and PROPERA, the patterns and underlying schema are created manually, whereas they are created automically for BOA by using knowlege contained in DBpedia.…”
Section: Related Workmentioning
confidence: 99%
“…Semi-supervised bootstrapping approaches such as NELL [5], PROSPERA [15] and BOA [9] start with a set of seed natural language patterns, then employ an iterative approach to both extract information for those patterns and learn new patterns. For NELL and PROPERA, the patterns and underlying schema are created manually, whereas they are created automically for BOA by using knowlege contained in DBpedia.…”
Section: Related Workmentioning
confidence: 99%
“…In each iteration, NELL uses the available instance knowledge to retrieve new instances of existing categories and relations between known instances by using pattern harvesting. The approach followed by PROSPERA [17] is similar to that of NELL but relies on the iterative harvesting of n-grams-itemset patterns. These patterns allow to generalize NL patterns found in text without introducing more noise into the patterns during the generalization process.…”
Section: Related Workmentioning
confidence: 99%
“…When using BOA iteratively, the output of each RDF generation would provide parts of the input for the subsequent extraction process. In previous work, semantic drift has been shown to be one of the key problems of such iterative approaches [4,17]. In order to maintain a high precision and to avoid semantic drift within the BOA framework, we solely select the top-n percent of all scored patterns for generating RDF.…”
Section: Rdf Generationmentioning
confidence: 99%
“…The process of populating a structured relational database from unstructured sources has received renewed interest in the database community through high-profile start-up companies (e.g., Tamr and Trifacta), established companies like IBM's Watson [7,16], and a variety of research efforts [11,25,28,36,40]. At the same time, communities such as natural language processing and machine learning are attacking similar problems under the name knowledge base construction (KBC) [5,14,23].…”
Section: Introductionmentioning
confidence: 99%
“…1 DeepDive's language and execution model are similar to other KBC systems: DeepDive uses a high-level declarative language [11,28,30]. From a database perspective, DeepDive's language is based on SQL.…”
Section: Introductionmentioning
confidence: 99%