Nicolai Erbs scite author profile

We present a comprehensive analysis of approaches for discovering links in document collections. We classify link discovery approaches with respect to the type of knowledge being used: the text of a document, its title, and already existing links. Using an evaluation dataset derived from Wikipedia, we find that link-based approaches outperform all other approaches if they can draw knowledge from a very large amount of already existing links. Simulating other document collections with fewer links, we show that text-based approaches yield better results. Furthermore, we argue that knowledge from Wikipedia cannot necessarily be applied to other domains, e.g. in corporate intranets. Thus, we conclude that text-based approaches are the best choice for reliable link discovery in arbitrary document collections.

show abstract

DKPro Keyphrases: Flexible and Reusable Keyphrase Extraction Experiments

Erbs¹,

Santos²,

Gurevych³

et al. 2014

View full text Add to dashboard Cite

DKPro Keyphrases is a keyphrase extraction framework based on UIMA. It offers a wide range of state-of-the-art keyphrase experiments approaches. At the same time, it is a workbench for developing new extraction approaches and evaluating their impact. DKPro Keyphrases is publicly available under an open-source license. 1 33 Dimension.create("evalType", EvaluatorType.Lemma), 34); 35 36 Task preprocessingTask = new PreprocessingTask(); 37 Task filteringTask = new KeyphraseFilteringTask(); 38 candidateSelectionTask.addImport(preprocessingTask, PreprocessingTask.OUTPUT, KeyphraseFilteringTask.INPUT); 39 Task keyphraseRankingTask = new KeyphraseRankingTask(); 40 keyphraseRankingTask.addImport(filteringTask, KeyphraseFilteringTask.OUTPUT, KeyphraseRankingTask.INPUT); 41 42 BatchTask batch = new BatchTask(); 43 batch.setParameterSpace(params); 44 batch.addTask(preprocessingTask); 45 batch.addTask(candidateSelectionTask); 46 batch.addTask(keyphraseRankingTask); 47 batch.addReport(KeyphraseExtractionReport.class); 48 Lab.getInstance().run(batch);

show abstract

Counting What Counts: Decompounding for Keyphrase Extraction

Erbs¹,

Santos²,

Zesch³

et al. 2015

View full text Add to dashboard Cite

A core assumption of keyphrase extraction is that a concept is more important if it is mentioned more often in a document. Especially in languages like German that form large noun compounds, frequency counts might be misleading as concepts "hidden" in compounds are not counted. We hypothesize that using decompounding before counting term frequencies may lead to better keyphrase extraction. We identified two effects of decompounding: (i) enhanced frequency counts, and (ii) more keyphrase candidates. We created two German evaluation datasets to test our hypothesis and analyzed the effect of additional decompounding for keyphrase extraction.

show abstract

Sense and Similarity: A Study of Sense-level Similarity Measures

Erbs

Gurevych

Zesch

2014

View full text Add to dashboard Cite

In this paper, we investigate the difference between word and sense similarity measures and present means to convert a state-of-the-art word similarity measure into a sense similarity measure. In order to evaluate the new measure, we create a special sense similarity dataset and re-rate an existing word similarity dataset using two different sense inventories from WordNet and Wikipedia. We discover that word-level measures were not able to differentiate between different senses of one word, while sense-level measures actually increase correlation when shifting to sense similarities. Sense-level similarity measures improve when evaluated with a re-rated sense-aware gold standard, while correlation with word-level similarity measures decreases.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Nicolai Erbs

Bringing Order to Digital Libraries: From Keyphrase Extraction to Index Term Assignment

Link Discovery: A Comprehensive Analysis

DKPro Keyphrases: Flexible and Reusable Keyphrase Extraction Experiments

Counting What Counts: Decompounding for Keyphrase Extraction

Sense and Similarity: A Study of Sense-level Similarity Measures

Contact Info

Product

Resources

About