2009
DOI: 10.1145/1519103.1519106
|View full text |Cite
|
Sign up to set email alerts
|

Information extraction challenges in managing unstructured data

Abstract: Over the past few years, we have been trying to build an end-to-end system at Wisconsin to manage unstructured data, using extraction, integration, and user interaction. This paper describes the key information extraction (IE) challenges that we have run into, and sketches our solutions. We discuss in particular developing a declarative IE language, optimizing for this language, generating IE provenance, incorporating user feedback into the IE process, developing a novel wikibased user interface for feedback, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
33
0

Year Published

2009
2009
2015
2015

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 54 publications
(33 citation statements)
references
References 11 publications
0
33
0
Order By: Relevance
“…There is a plethora of research in information extraction [11,14], entity resolution [15], schema mapping [16,19] and, in general, information integration [17]. While Midas relies on technologies and ideas from these areas, our main contribution can be seen in the synergistic use of both unstructured and structured information integration to build a comprehensive solution for the financial domain that brings out the value of the data in public sources.…”
Section: Introductionmentioning
confidence: 99%
“…There is a plethora of research in information extraction [11,14], entity resolution [15], schema mapping [16,19] and, in general, information integration [17]. While Midas relies on technologies and ideas from these areas, our main contribution can be seen in the synergistic use of both unstructured and structured information integration to build a comprehensive solution for the financial domain that brings out the value of the data in public sources.…”
Section: Introductionmentioning
confidence: 99%
“….,(p ij , (l ijr ; d ijr ), v ijr )} The proposed TRSs for unstructured data are simple and straight forward. Unlike Information Extraction (IE) tool, we translate the unstructured data into the collection of triples without extracting the structure from the data (Grishman, 1997;Doan et al, 2009a;Doan et al, 2009b;Al-Mathami, 1998), because the existing IE tools have the following disadvantages (Kastrati et al, 2011): first, such approaches are costly due to a very large collection of data have high preprocessing cost, second, automatic extraction of structure is a source of uncertainty (Sarma et al, 2009), and third, they consist of out-of-dated version of extracted data already stored in somewhere. Therefore, we have adopted an approach proposed by F. Kastrati et.…”
Section: Unstructured Data Modelmentioning
confidence: 99%
“…Cimple [5] and SystemT [11] systems performs RDBMS operations like as joins across extracted facts which are accumulated in database tables. Cimple is UDMS (Unstructured Data Management Systems) project which is used to construct community information management systems using extraction, user interaction and integration.…”
Section: Related Workmentioning
confidence: 99%