2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2017
DOI: 10.1109/jcdl.2017.7991601
|View full text |Cite
|
Sign up to set email alerts
|

Impact of URI Canonicalization on Memento Count

Abstract: Quantifying the captures of a URI over time is useful for researchers to identify the extent to which a Web page has been archived. Memento TimeMaps provide a format to list mementos (URI-Ms) for captures along with brief metadata, like Memento-Datetime, for each URI-M. However, when some URI-Ms are dereferenced, they simply provide a redirect to a different URI-M (instead of a unique representation at the datetime), often also present in the TimeMap. This infers that confidently obtaining an accurate count qu… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
2
1

Relationship

4
5

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 10 publications
0
5
0
Order By: Relevance
“…In previous work [15], we highlighted that the URI-Ms in a TimeMap for google.com produce nearly 85% HTTP redirects when dereferenced. Determining how many mementos exist from an archive for a URI-R is thereby impossible from a TimeMap alone.…”
Section: Content-based Attributesmentioning
confidence: 91%
See 1 more Smart Citation
“…In previous work [15], we highlighted that the URI-Ms in a TimeMap for google.com produce nearly 85% HTTP redirects when dereferenced. Determining how many mementos exist from an archive for a URI-R is thereby impossible from a TimeMap alone.…”
Section: Content-based Attributesmentioning
confidence: 91%
“…Additional information about a URI-M would be useful if present in a TimeMap. For example, knowing the HTTP status code of the dereferenced URI-M would reduce the amount of time needed to determine unique captures in the archive [15]. Extending TimeMaps may also provide the facility for the integration of private and public Web archives.…”
Section: Background and Related Work 21 Archiving And Linked Datamentioning
confidence: 99%
“…Most research involving Memento aggregation relates to usage of the aggregator rather than enhancement of the aggregation process. In the same way that prior to MemGator, researchers would state "we requested URIs from the Time Travel Service", this statement was transformed to "we used MemGator to request URIs", indicative that it was useful for researchers to utilize their own aggregator instance [21,14,4]. A facet of this use case is the ability for researchers to customize the set of web archives to be used as the basis for querying, which is performed prior to running MemGator by modifying a configuration file 4 .…”
Section: Related Workmentioning
confidence: 99%
“…However, when we analyzed our data, we found that there were many fluctuations between "200" and "404", where some resources changed their status codes back and forth hundreds of times. It turned out that it was caused by lack of proper URI normalization/canonicalization [50], [51]. For example, when a TimeMap was requested for "apple.com" they returned "200", but for "Apple.com" or "APPLE.COM" they returned "404" instead.…”
Section: Status Code Changes Over Timementioning
confidence: 99%