Proceedings of the 2010 EDBT/ICDT Workshops 2010
DOI: 10.1145/1754239.1754287
|View full text |Cite
|
Sign up to set email alerts
|

Using visual pages analysis for optimizing web archiving

Abstract: Due to the growing importance of the World Wide Web, archiving it has become crucial for preserving useful source of information. To maintain a web archive up-to-date, crawlers harvest the web by iteratively downloading new versions of documents. However, it is frequent that crawlers retrieve pages with unimportant changes such as advertisements which are continually updated. Hence, web archive systems waste time and space for indexing and storing useless page versions. Also, querying the archive can take more… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0
1

Year Published

2011
2011
2021
2021

Publication Types

Select...
3
3
2

Relationship

4
4

Authors

Journals

citations
Cited by 16 publications
(7 citation statements)
references
References 16 publications
0
6
0
1
Order By: Relevance
“…et al [21] uses VIPs algorithm to identify if the change in the page is important for archiving or not. Their work aims to enhance the efficiency of web page archiving.…”
Section: Vision-based Approachmentioning
confidence: 99%
“…et al [21] uses VIPs algorithm to identify if the change in the page is important for archiving or not. Their work aims to enhance the efficiency of web page archiving.…”
Section: Vision-based Approachmentioning
confidence: 99%
“…• Change importance estimation In this phase, the successive delta files generated during a day are evaluated by the function [5] to estimate the importance of changes. This function depends of three major parameters; (i) the importance of each block composing the page, (ii) the importance of changes operations (insertion, deletion, etc.)…”
Section: Pattern Discovery Phasesmentioning
confidence: 99%
“…As detailed in [5], the importance of changes detected in the delta ∆ is computed by the following function based on three major parameters:…”
Section: Principlementioning
confidence: 99%
See 1 more Smart Citation
“…Web page segmentation refers to the process of dividing a Web page into visually and semantically coherent segments called blocks. Detecting these different blocks is a crucial step for many applications, such as mobile applications [20], information retrieval [5], web archiving [14], among others. In the context of Web archiving, segmentation can be used to extract interesting parts to be stored.…”
Section: Introductionmentioning
confidence: 99%