Proceedings of the 2019 International Conference on Management of Data 2019
DOI: 10.1145/3299869.3319899
|View full text |Cite
|
Sign up to set email alerts
|

Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(2 citation statements)
references
References 34 publications
0
2
0
Order By: Relevance
“…Madhusudan and Poonam (2017) discuss how the challenge of crawling the deep Web can be approached since it makes up approximately 96% of all Web content. Recently, Wang et al (2019) proposed SmartCrawl , which is designed to maximize the return of hidden records from a database, given a set of queries and a fixed budget (e.g., number of API calls per day). Meschenmoser et al (2016) outline and provide solutions for the common challenges when crawling scientific resources like pagination (splitting the results into pages), dynamic contents (page updates when scrolling to the bottom), and access barriers like obfuscated URL parameters and robot detection mechanisms.…”
Section: Componentsmentioning
confidence: 99%
“…Madhusudan and Poonam (2017) discuss how the challenge of crawling the deep Web can be approached since it makes up approximately 96% of all Web content. Recently, Wang et al (2019) proposed SmartCrawl , which is designed to maximize the return of hidden records from a database, given a set of queries and a fixed budget (e.g., number of API calls per day). Meschenmoser et al (2016) outline and provide solutions for the common challenges when crawling scientific resources like pagination (splitting the results into pages), dynamic contents (page updates when scrolling to the bottom), and access barriers like obfuscated URL parameters and robot detection mechanisms.…”
Section: Componentsmentioning
confidence: 99%
“…Matching revolves around providing correspondences between concepts describing the meaning of data in various heterogeneous, distributed data sources, such as SQL and XML schemata, entity-relationship diagrams, ontology descriptions, interface definitions, etc. The need for schema matching arises in a variety of domains including linking datasets and entities for data discovery [27,57,58], finding related tables in data lakes [63], data enrichment [59], aligning ontologies and relational databases for the Semantic Web [24], and document format merging (e.g., orders and invoices in e-commerce) [52]. As an example, a shopping comparison app that supports queries such as "the cheapest computer among retailers" or "the best rate for a flight to Boston in September" requires integrating and matching several data sources of product orders and airfare forms.…”
Section: Introductionmentioning
confidence: 99%