2012
DOI: 10.1007/978-3-642-31753-8_27
|View full text |Cite
|
Sign up to set email alerts
|

Turn the Page: Automated Traversal of Paginated Websites

Abstract: Content-intensive web sites, such as Google or Amazon, paginate their results to accommodate limited screen sizes. Thus, human users and automatic tools alike have to traverse the pagination links when they crawl the site, extract data, or automate common tasks, where these applications require access to the entire result set. Previous approaches, as well as existing crawlers and automation tools, rely on simple heuristics (e.g., considering only the link text), falling back to an exhaustive exploration of the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 16 publications
0
5
0
Order By: Relevance
“…Most Web sites use pagination to present a set number of results in Web search results or demonstrating a set number of posts while seeing a discussion string. Pagination can be used as a part of some structure on each Web application to isolate returned information and show it on different pages (Furche, et al, 2012). Pagination additionally incorporates a rationale of planning and showing connections to different pages.…”
Section: Pagination In Web Sitesmentioning
confidence: 99%
“…Most Web sites use pagination to present a set number of results in Web search results or demonstrating a set number of posts while seeing a discussion string. Pagination can be used as a part of some structure on each Web application to isolate returned information and show it on different pages (Furche, et al, 2012). Pagination additionally incorporates a rationale of planning and showing connections to different pages.…”
Section: Pagination In Web Sitesmentioning
confidence: 99%
“…Location maintenance is addressed through both preventive and curative approaches (see Grace et al [2011] for an overview). The former attempts to avoid locator failure by providing more robust XPath expressions [Kowalkiewicz et al 2006;Paz and Díaz 2010].…”
Section: Maintainabilitymentioning
confidence: 99%
“…Consider the problem of identifying semantic blocks on a web page, such as pagination bars, navigation menus, headers, footers, and sidebars [26,35]. The page is represented by the DOM tree and CSS model.…”
Section: Feature Extraction With Vadalogmentioning
confidence: 99%