Two-Stage Named-Entity Recognition Using Averaged Perceptrons

Buitinck, Lars; Marx, Maarten

doi:10.1007/978-3-642-31178-9_17

Cited by 5 publications

(8 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Experiment I: Sampling pseudo-ground truth. Our first experiment aims to answer RQ1: What is the utility of our sampling methods for generating pseudo-ground truth 4 Using the smallest KB s (20%) results in about 15,000 tweets in the pseudo-ground truth. for a named entity recognizer?…”

Section: Resultsmentioning

confidence: 99%

“…We cater for this bias by randomly sampling 10,000 tweets from both the test set and the pseudo-ground truth and repeating our experiments ten times. 4 Ground truth is then assembled by linking the corpus of tweets using KB. This ground truth consists of 82,305 tweets, with 12,488 unique concepts.…”

Section: Methodsmentioning

confidence: 99%

“…Now we can proceed and train NERC with our generated pseudoground truth. We do so using a two-stage approach [4] where the recognition stage is implemented using the fast structured perceptron algorithm [7]. 2 Table 1.…”

Section: Unsupervised Generation Of Pseudo-ground Truthmentioning

confidence: 99%

See 2 more Smart Citations

Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams

Graus

Tsagkias

Buitinck

et al. 2014

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams

Graus

Tsagkias

Buitinck

et al. 2014

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…For the extraction, we employed the named entity recognizer of [2]. We chose this tool as it is one of the strongest named entity recognizers in the Dutch language area with a reported F1-score of 83.56% (see [2] for a comparison with other systems).…”

Section: Named Entity Extractionmentioning

confidence: 99%

“…We chose this tool as it is one of the strongest named entity recognizers in the Dutch language area with a reported F1-score of 83.56% (see [2] for a comparison with other systems). We used a preliminary version of the annotations in the SoNaR corpus [11] as a training set for the NER tagger; this set is annotated according to a rich NER tagging scheme that distinguishes the categories person, location, organisation, product, event and miscellaneous.…”

Section: Named Entity Extractionmentioning

confidence: 99%

Linking the kingdom

Boer¹,

Doornik

Buitinck

et al. 2013

Proceedings of the Seventh International Conference on Knowledge Capture

Self Cite

View full text Add to dashboard Cite

Digital history is a branch of digital humanities concerned using ICT to improve study of history. Linked Data provides a way of effective enriched digital access to scientific texts about history (historiographies). In this paper, we present a method for connecting a historiographical text to the Linked Data cloud. We present the method and tools that we use in each of the method's steps. We focus on one extensive case study: the enriched access of an important work of Dutch World War II historiography "Het Koninkrijk der Nederlanden in de Tweede Wereldoorlog". We describe the digitization and present two sources of structured knowledge that link to individual text sources, retrievable on the Web of Data. The first is the manually constructed and highly curated "Back of the Book Index". The second is a list of extracted Named Entities. We compare both structured sources as stepping stones to the Web of Data and present a number of use cases relevant for both historical researchers as well as for the general public.

show abstract

Digital Heritage. Progress in Cultural Heritage: Documentation, Preservation, and Protection

2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

Two-Stage Named-Entity Recognition Using Averaged Perceptrons

Cited by 5 publications

References 3 publications

Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams

Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams

Linking the kingdom

Digital Heritage. Progress in Cultural Heritage: Documentation, Preservation, and Protection

Contact Info

Product

Resources

About