FREEBASE contains entities and relation information but is highly incomplete. Relevant information is ubiquitous in web text, but extraction deems challenging. We present JEDI, an automated system to jointly extract typed named entities and FREEBASE relations using dependency pattern from text. An innovative method for constraint solving on entity types of multiple relations is used to disambiguate pattern. The high precision in the evaluation supports our claim that we can detect entities and relations together, alleviating the need to train a custom classifier for an entity type 1 .
Mit "Big Data" werden Technologien beschrieben, die nicht weniger als die Erfüllung eines der Kernziele der Wirtschaftsinformatik versprechen: die richtigen Informationen dem richtigen Adressaten zur richtigen Zeit in der richtigen Menge am richtigen Ort und in der erforderlichen Qualität bereitzustellen. Am Beispiel des Phänomens "Big-Data-Hybris" werden technische, wirtschaftliche und rechtliche Voraussetzungen zur Erfüllung dieses Versprechens diskutiert. Auf Grund ihres interdisziplinären Selbstverständnisses ist die Wirtschaftsinformatik ideal positioniert, um Big Data kritisch zu begleiten und Erkenntnisse für die Erklärung und Gestaltung innovativer Informationssysteme in Wirtschaft und Verwaltung zu nutzen -unabhängig davon, ob sich Big Data nun tatsächlich als eine disruptive Technologie erweist oder doch nur eine flüchtige Modeerscheinung ist.
Abstract. Recognizing fine-grained named entities, i.e., street and city instead of just the coarse type location, has been shown to increase task performance in several contexts. Fine-grained types, however, amplify the problem of data sparsity during training, which is why larger amounts of training data are needed. In this contribution we address scalability issues caused by the larger training sets. We distribute and parallelize feature extraction and parameter estimation in linear-chain conditional random fields, which are a popular choice for sequence labeling tasks such as named entity recognition (NER) and part of speech (POS) tagging. To this end, we employ the parallel stream processing framework Apache Flink which supports in-memory distributed iterations. Due to this feature, contrary to prior approaches, our system becomes iteration-aware during gradient descent. We experimentally demonstrate the scalability of our approach and also validate the parameters learned during distributed training in a fine-grained NER task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.