The ontological key: automatically understanding and integrating forms to access the deep Web

Furche, Tim; Gottlob, Georg; Grasso, Giovanni; Guo, Xiaonan; Orsi, Giorgio; Schallhart, Christian

doi:10.1007/s00778-013-0323-0

Cited by 22 publications

(15 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Also, the ability of ROSEANN to annotate different sections of the DOM with different annotation pools, together with its reconciliation capabilities, reduce the noise in the annotations that is the main source of errors in annotationdriven wrapper inducers such as [3]. Figure 6 shows the use of ROSEANN within DIADEM, in particular, for the unsupervised segmentation of classified listings on the web [6] and understanding of forms [5].…”

Section: Applications Of Roseannmentioning

confidence: 98%

ROSeAnn

Ortona

Orsi

2014

Proceedings of the 23rd International Conference on World Wide Web

Self Cite

View full text Add to dashboard Cite

Named entity extractors are a popular means for enriching documents with semantic annotations. Both the overlap and the increasing diversity in the capabilities and in the vocabularies of the annotators motivate the need for managing and integrating semantic annotations in a coherent and uniform fashion.ROSEANN is a framework for the management and the reconciliation of semantic annotations. It provides end-users and programmers with a unified view over the results of multiple online and standalone annotators, linking them to an integrated ontology of their vocabularies, and supporting a variety of document formats such as: plain text, live Web pages, and PDF documents. Although ROSEANN provides two pre-defined algorithms for conflict resolution -one supervised, appropriate when representative training data is available, and one unsupervised -it also allows application developers to define their own integration techniques, as well as extending the pool of annotators as new ones become available.

show abstract

Section: Applications Of Roseannmentioning

confidence: 98%

ROSeAnn

Ortona

Orsi

2014

Proceedings of the 23rd International Conference on World Wide Web

Self Cite

View full text Add to dashboard Cite

show abstract

“…The data wrangling functionality, for example for mapping generation or format transformation, is implemented as a collection of loosely coupled components that build on the concept of a relational transducer (or simply transducer). Transducers were introduced by Abiteboul et al [5], and have been successfully applied and extended in a variety of applications [10], including web data extraction [11].…”

Section: Transducersmentioning

confidence: 99%

“…• δ guard [11] are Vadalog rules that describe whether a transducer is ready to be executed. • δ scopes [11] are Vadalog rules that describe the scope of the transducer (i.e., parts of the knowledge base and external sources that the transducers depend on). • δ map are Vadalog rules that describe the mapping between the knowledge base and other external schemata, and the internal schema of the transducer.…”

Section: Transducersmentioning

confidence: 99%

VADA: an architecture for end user informed data preparation

et al. 2019

Self Cite

View full text Add to dashboard Cite

Background: Data scientists spend considerable amounts of time preparing data for analysis. Data preparation is labour intensive because the data scientist typically takes fine grained control over each aspect of each step in the process, motivating the development of techniques that seek to reduce this burden. Results: This paper presents an architecture in which the data scientist need only describe the intended outcome of the data preparation process, leaving the software to determine how best to bring about the outcome. Key wrangling decisions on matching, mapping generation, mapping selection, format transformation and data repair are taken by the system, and the user need only provide: (i) the schema of the data target; (ii) partial representative instance data aligned with the target; (iii) criteria to be prioritised when populating the target; and (iv) feedback on candidate results. To support this, the proposed architecture dynamically orchestrates a collection of loosely coupled wrangling components, in which the orchestration is declaratively specified and includes self-tuning of component parameters. Conclusion: This paper describes a data preparation architecture that has been designed to reduce the cost of data preparation through the provision of a central role for automation. An empirical evaluation with deep web and open government data investigates the quality and suitability of the wrangling result, the cost-effectiveness of the approach, the impact of self-tuning, and scalability with respect to the numbers of sources.

show abstract

“…Another interesting method in the same family is the Ontology-based web Pattern Analysis with Logic (OPAL) method (Furche, 2011) (Furche, 2013). This method does not take in entry a set of deep web sources URL but only the domain name of interest.…”

Section: The Form Integration Approachmentioning

confidence: 99%

Discovering the Deep Web through XML Schema Extraction

Saissi

Zellou

Idri

2016

Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management

View full text Add to dashboard Cite

The web accessible by the search engines contains a vast amount of information. However, there is another part of the web called the deep web accessible only through its associated HTML forms, and containing much more information. The integration of the deep web content presents many challenges that are not fully addressed by the actual deep web access approaches. The integration of the deep web data requires knowing the schema describing each deep web source. This paper presents our approach to extract the XML schema describing a selected deep web source. The XML schema extracted will be used to integrate the associated deep web source into a mediation system. The principle of our approach is to apply a static and a dynamic analysis to the HTML forms giving access to the selected deep web source. We describe the algorithms of our approach and compare it to the other existing approaches.

show abstract

The ontological key: automatically understanding and integrating forms to access the deep Web

Cited by 22 publications

References 33 publications

ROSeAnn

ROSeAnn

VADA: an architecture for end user informed data preparation

Discovering the Deep Web through XML Schema Extraction

Contact Info

Product

Resources

About