Information extraction challenges in managing unstructured data

Doan, AnHai; Naughton, Jeffrey F.; Ramakrishnan, Raghu; Baid, Akanksha; Chai, Xiaoyong; Chen, Fei; Chen, Ting; Chu, Eric; DeRose, Pedro; Gao, Byron J.; Gokhale, Chaitanya S.; Huang, Joe Chun‐Chia; Shen, Warren; Vuong, Ba-Quy

doi:10.1145/1519103.1519106

Cited by 54 publications

(33 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There is a plethora of research in information extraction [11,14], entity resolution [15], schema mapping [16,19] and, in general, information integration [17]. While Midas relies on technologies and ideas from these areas, our main contribution can be seen in the synergistic use of both unstructured and structured information integration to build a comprehensive solution for the financial domain that brings out the value of the data in public sources.…”

Section: Introductionmentioning

confidence: 99%

Extracting, Linking and Integrating Data from Public Sources: A Financial Case Study

Burdick¹,

Hernández²,

Ho³

et al. 2015

SSRN Journal

View full text Add to dashboard Cite

show abstract

Section: Introductionmentioning

confidence: 99%

Extracting, Linking and Integrating Data from Public Sources: A Financial Case Study

Burdick¹,

Hernández²,

Ho³

et al. 2015

SSRN Journal

View full text Add to dashboard Cite

show abstract

“….,(p ij , (l ijr ; d ijr ), v ijr )} The proposed TRSs for unstructured data are simple and straight forward. Unlike Information Extraction (IE) tool, we translate the unstructured data into the collection of triples without extracting the structure from the data (Grishman, 1997;Doan et al, 2009a;Doan et al, 2009b;Al-Mathami, 1998), because the existing IE tools have the following disadvantages (Kastrati et al, 2011): first, such approaches are costly due to a very large collection of data have high preprocessing cost, second, automatic extraction of structure is a source of uncertainty (Sarma et al, 2009), and third, they consist of out-of-dated version of extracted data already stored in somewhere. Therefore, we have adopted an approach proposed by F. Kastrati et.…”

Section: Unstructured Data Modelmentioning

confidence: 99%

Transformation rules for decomposing heterogeneous data into triples

Singh

Jain

2015

Journal of King Saud University - Computer and Information Scie

View full text Add to dashboard Cite

In order to fulfill the vision of a dataspace system, it requires a flexible, powerful and versatile data model that is able to represent a highly heterogeneous mix of data such as databases, web pages, XML, deep web, and files. In literature, the triple model was found a suitable candidate for a dataspace system, and able to represent structured, semi-structured and unstructured data into a single model. A triple model is based on the decomposition theory, and represents variety of data into a collection of triples. In this paper, we have proposed a decomposition algorithm for expressing various heterogeneous data models into the triple model. This algorithm is based on the decomposition theory of the triple model. By applying the decomposition algorithm, we have proposed a set of transformation rules for the existing data models. The transformation rules have been categorized for structured, semi-structured, and unstructured data models. These rules are able to decompose most of the existing data models into the triple model. We have empirically verified the algorithm as well as the transformation rules on different data sets having different data models. ª 2015 The Authors. Production and hosting by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

show abstract

“…Cimple [5] and SystemT [11] systems performs RDBMS operations like as joins across extracted facts which are accumulated in database tables. Cimple is UDMS (Unstructured Data Management Systems) project which is used to construct community information management systems using extraction, user interaction and integration.…”

Section: Related Workmentioning

confidence: 99%

Extraction of incremental information using query evaluator

Saste¹,

Patil²

2014

2014 First International Conference on Networks &Amp; Soft Computing (ICNSC2014)

View full text Add to dashboard Cite

Information Extraction is an activity of examine text for information relevant to some interest. Information extraction needs depth analysis than simple key word searches. The information extraction system recognizes and extracts knowledge from a massive literature and extracted knowledge is accumulated in a knowledge base. Many conventional automatic information extraction approaches using Natural Language Processing and Text Mining technologies have been proposed to extract meaningful information automatically in biomedical realm. These conventional approaches have considerable pitfall that whenever a different extraction goal become visible or any component in system is upgraded, extraction has to be reapplied from beginning to the whole text collection although only a minor part of the text collection might be influenced. In this paper we have applied Stanford dependency grammar to furnish easy description of the grammatical relationships in a sentence. This work relates incremental information extraction approach in which extraction needs are exhibited in the form of database queries. This work aims that in the occasion of installation of a upgraded component, reduction in the processing time takes place as compared to a conventional approach.

show abstract

Information extraction challenges in managing unstructured data

Cited by 54 publications

References 11 publications

Extracting, Linking and Integrating Data from Public Sources: A Financial Case Study

Extracting, Linking and Integrating Data from Public Sources: A Financial Case Study

Transformation rules for decomposing heterogeneous data into triples

Extraction of incremental information using query evaluator

Contact Info

Product

Resources

About