Understanding the Search Interfaces of the Deep Web Based on Domain Model

Yuan, Xiaojie; Hui-bin, Zhang; Yang, Zongyun; Wen, Yangping

doi:10.1109/icis.2009.32

Cited by 3 publications

(4 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Because of the diversification of web page, it is difficult to guarantee that this premise will continue to be used in all web pages. The research of XiaoJie Yuan [11] and others builds up domain model by using a lot of query interfaces, then extracts words from the query interface, and assembles attribute word labels of the query interfaces into a tree structure according to the similarity degree between words in the domain model and words in the query interface. But this method is limited to the founding of the domain model, besides, the effectiveness of this method remains to be discussed.…”

Section: Related Workmentioning

confidence: 99%

“…At present, there are some researches [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20] about query interface schema of Deep Web, but most of them concentrate on the study of query interface integration of Deep Web. While as the key step before the query interface integration, there are not so many studies about the extraction and establishment of query interface schema.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Research on Extract the Schema of Query Interfaces

2015

2015 10th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)

View full text Add to dashboard Cite

As the main approach to obtain the Deep Web data is to fill query interface provided by the pages, and then obtain them by submitting a query request to the Deep Web server, so an important step to access the Deep Web resources is to analyse the query request of Deep Web server effectively. However, the query interface is designed under different schemas and uses different language, thus it makes the extraction work of high-precision query interface schema changeable. To improve accuracy of schema extraction and to achieve interpretation of the query interfaces at semantic level, this paper proposes a new definition of query interface schema, and designs a kind of schema extraction method which based on query interface visual information and page information. The experiment adopts TEL-8 data sets of UIUC, and the experimental results show that the method of this paper has reached over 90% accuracy in different areas, in some areas even more than 95% accuracy, thus it has good feasibility and practicability.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Research on Extract the Schema of Query Interfaces

2015

2015 10th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)

View full text Add to dashboard Cite

show abstract

“…Forms are primarily designed for human beings, but they must also be understood by automated agents for various applications such as general-purpose indexing of response pages, focused indexing [13], extensional crawling strategies (e.g., Web archiving), automatic construction of ontologies [29], etc. However, most existing approaches to automatically explore and classify the deep Web crucially rely on domain knowledge [10,12,30] to guide form understanding. Moreover, they tend to separate the steps of form interface understanding and information extraction from result pages, although both contribute [27] to a more authentic vision on the backend database schema.…”

Section: Ontologies and The Deep Webmentioning

confidence: 99%

Cross-Fertilizing Deep Web Analysis and Ontology Enrichment

Oita¹,

Amarilli²,

Senellart³

2017

Preprint

View full text Add to dashboard Cite

Deep Web databases, whose content is presented as dynamicallygenerated Web pages hidden behind forms, have mostly been left unindexed by search engine crawlers. In order to automatically explore this mass of information, many current techniques assume the existence of domain knowledge, which is costly to create and maintain. In this article, we present a new perspective on form understanding and deep Web data acquisition that does not require any domain-specific knowledge. Unlike previous approaches, we do not perform the various steps in the process (e.g., form understanding, record identification, attribute labeling) independently but integrate them to achieve a more complete understanding of deep Web sources. Through information extraction techniques and using the form itself for validation, we reconcile input and output schemas in a labeled graph which is further aligned with a generic ontology. The impact of this alignment is threefold: first, the resulting semantic infrastructure associated with the form can assist Web crawlers when probing the form for content indexing; second, attributes of response pages are labeled by matching known ontology instances, and relations between attributes are uncovered; and third, we enrich the generic ontology with facts from the deep Web.

show abstract

“…In [34], a (manually derived) domain schema is used to guide understanding. In contrast to OPAL, it segments a form purely based on the domain schema (called schema tree).…”

Section: Form Understandingmentioning

confidence: 99%

The ontological key: automatically understanding and integrating forms to access the deep Web

et al. 2013

View full text Add to dashboard Cite

Forms are our gates to the web. They enable us to access the deep content of web sites. Automatic form understanding provides applications, ranging from crawlers over meta-search engines to service integrators, with a key to this content. Yet, it has received little attention other than as component in specific applications such as crawlers or meta-search engines. No comprehensive approach to form understanding exists, let alone one that produces rich models for semantic services or integration with linked open data.In this paper, we present OPAL, the first comprehensive approach to form understanding and integration. We identify form labeling and form interpretation as the two main tasks involved in form understanding. On both problems OPAL pushes the state of the art: For form labeling, it combines features from the text, structure, and visual rendering of a web page. In extensive experiments on the ICQ and TEL-8 benchmarks and a set of 200 modern web forms OPAL outperforms previous approaches for form labeling by a significant margin. For form interpretation, OPAL uses a schema (or ontology) of forms in a given domain. Thanks to this domain schema, it is able to produce nearly perfect (> 97% accuracy in the evaluation domains) form interpretations. Yet, the effort to produce a domain schema is very low, as we provide a Datalog-based template language that eases the specification of such schemata and a methodology for deriving a domain schema largely automatically from an existing domain ontology. We demonstrate the value of OPAL's form interpretations through a light-weight form integration system that successfully translates and distributes master queries to hundreds of forms with no error, yet is implemented with only a handful translation rules.

show abstract

Understanding the Search Interfaces of the Deep Web Based on Domain Model

Cited by 3 publications

References 9 publications

Research on Extract the Schema of Query Interfaces

Research on Extract the Schema of Query Interfaces

Cross-Fertilizing Deep Web Analysis and Ontology Enrichment

The ontological key: automatically understanding and integrating forms to access the deep Web

Contact Info

Product

Resources

About