Proceedings of the Seventh International Conference on Information and Knowledge Management 1998
DOI: 10.1145/288627.288641
|View full text |Cite
|
Sign up to set email alerts
|

Ontology-based extraction and structuring of information from data-rich unstructured documents

Abstract: We present a new approach to extracting information from unstructured documents based on an application ontology that describes a domain of interest. Starting with such an ontology, we formulate rules to extract constants and context keywords from unstructured documents. For each unstructured document of interest, we extract its constants and keywords and apply a recognizer to organize extracted constants as attribute values of tuples in a generated database schema. To make our approach general, we fix all the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
58
0

Year Published

2000
2000
2005
2005

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 108 publications
(58 citation statements)
references
References 24 publications
0
58
0
Order By: Relevance
“…In previous works we showed howtoimprove performance in document analysis and understanding by using semantic context models [WM00]. One of the first ideas for using domain ontologies in information extraction have been described by [ECSL98]. Information extraction as such has been implemented by regarding [AI99].…”
Section: R Elated Workmentioning
confidence: 99%
“…In previous works we showed howtoimprove performance in document analysis and understanding by using semantic context models [WM00]. One of the first ideas for using domain ontologies in information extraction have been described by [ECSL98]. Information extraction as such has been implemented by regarding [AI99].…”
Section: R Elated Workmentioning
confidence: 99%
“…The essence of this problem boils down to organizing the concepts and concept instances in a HTML document into a (labeled) semantic partition tree. There are a number of areas related to this problem, namely, XML schema discovery [15,26,14,27], schema inference from HTML documents [8,2], wrapper construction [17,7,25], record boundary detection in HTML documents [12,11,10,4], and semantic annotation of HTML documents [18,19,9] However, our approach departs from all the related works above in several respects. Firstly, our main focus is on template-based content-rich HTML documents.…”
Section: Related Workmentioning
confidence: 99%
“…98-1 24.02 [Embley, et al, 1998] [Woods 2000] presents positive results for the creation of large-scale subsumption (i e abstraction) hierarchies from lexical and phrasal analysis of free text. By analyzing relationships among constituents of phrases and compound morphemes, lexical strings from text can be automatically placed at appropriate levels of generality within a hierarchy encoding subsumption relationships.…”
mentioning
confidence: 99%