Abstract:With the goal of harvesting all information about a given entity, in this paper, we try to harvest all matching documents for a given query submitted on a search engine. The objective is to retrieve all information about for instance "Michael Jackson", "Islamic State", or "FC Barcelona" from indexed data in search engines, or hidden data behind web forms, using a minimum number of queries. Policies of web search engines usually do not allow accessing all of the matching query search results for a given query. … Show more
“…Web content changes rapidly [95,97]. In Focused Web Harvesting [84] which aim it is to achieve a complete harvest for a given topic, this dynamic nature of the web creates problems for users who need to access a set of all the relevant web data to their topics of interest. Whether you are a fan following your favorite idol or a journalist investigating a topic, you may need not only to access all the relevant information but also the recent changes and updates.…”
Section: Discussionmentioning
confidence: 99%
“…Surfacing approaches try to cover all the topics in a website. However, in focused web harvesting [84,86], harvesters focus on extracting all relevant information to a given query, topic or entity.…”
Section: Focused Web Harvestingmentioning
confidence: 99%
“…Therefore, in focused web harvesting approaches, the content of pages can be extensively used and matched against queries to examine the relevance of pages to given query topics. Our previous work [84,86] contributes mostly to this deep web access category.…”
Section: Focused Web Harvestingmentioning
confidence: 99%
“…As an alternative, focused web harvesting techniques are applicable. We defined focused web harvesting as harvesting all documents matching a given entity by querying a web search engine in Chapter 4 [84]. For instance, information about "Bernie Sanders", "Islamic State" or "Golden Ball Award" are retrieved from indexed data in general search engines or hidden data behind web forms by submitting a stream of queries and retrieving their returned results.…”
Section: Introductionmentioning
confidence: 99%
“…For instance, information about "Bernie Sanders", "Islamic State" or "Golden Ball Award" are retrieved from indexed data in general search engines or hidden data behind web forms by submitting a stream of queries and retrieving their returned results. Queries are formed by adding terms to a seed query with the goal of returning more unique documents with respect to the imposed limitations by search engines on the number of submitted queries by a user and the number of returned results he can view [84].…”
“…Web content changes rapidly [95,97]. In Focused Web Harvesting [84] which aim it is to achieve a complete harvest for a given topic, this dynamic nature of the web creates problems for users who need to access a set of all the relevant web data to their topics of interest. Whether you are a fan following your favorite idol or a journalist investigating a topic, you may need not only to access all the relevant information but also the recent changes and updates.…”
Section: Discussionmentioning
confidence: 99%
“…Surfacing approaches try to cover all the topics in a website. However, in focused web harvesting [84,86], harvesters focus on extracting all relevant information to a given query, topic or entity.…”
Section: Focused Web Harvestingmentioning
confidence: 99%
“…Therefore, in focused web harvesting approaches, the content of pages can be extensively used and matched against queries to examine the relevance of pages to given query topics. Our previous work [84,86] contributes mostly to this deep web access category.…”
Section: Focused Web Harvestingmentioning
confidence: 99%
“…As an alternative, focused web harvesting techniques are applicable. We defined focused web harvesting as harvesting all documents matching a given entity by querying a web search engine in Chapter 4 [84]. For instance, information about "Bernie Sanders", "Islamic State" or "Golden Ball Award" are retrieved from indexed data in general search engines or hidden data behind web forms by submitting a stream of queries and retrieving their returned results.…”
Section: Introductionmentioning
confidence: 99%
“…For instance, information about "Bernie Sanders", "Islamic State" or "Golden Ball Award" are retrieved from indexed data in general search engines or hidden data behind web forms by submitting a stream of queries and retrieving their returned results. Queries are formed by adding terms to a seed query with the goal of returning more unique documents with respect to the imposed limitations by search engines on the number of submitted queries by a user and the number of returned results he can view [84].…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.