2015
DOI: 10.1142/s0218213015400059
|View full text |Cite
|
Sign up to set email alerts
|

A Scalable Approach to Harvest Modern Weblogs

Abstract: Blogs are one of the most prominent means of communication on the web. Their content, interconnections and influence constitute a unique socio-technical artefact of our times which needs to be preserved. The BlogForever project has established best practices and developed an innovative system to harvest, preserve, manage and reuse blog content. This paper presents the latest developments of the blog crawler which is a key component of the BlogForever platform. More precisely, our work concentrates on technique… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(5 citation statements)
references
References 14 publications
0
5
0
Order By: Relevance
“…Using the framework we developed in the context of WebGraph-It to enable easy web crawling algorithm implementation (Section 4.3.2), we aim to evolve existing web crawling algorithms and create new. Also, we plan to implement our methods in other open existing web crawlers such as the BlogForever platform [16]. Finally, we aim to launch a public web service via http://webgraph-it.com to provide users with web crawling and webgraph generation services.…”
Section: Discussionmentioning
confidence: 99%
See 4 more Smart Citations
“…Using the framework we developed in the context of WebGraph-It to enable easy web crawling algorithm implementation (Section 4.3.2), we aim to evolve existing web crawling algorithms and create new. Also, we plan to implement our methods in other open existing web crawlers such as the BlogForever platform [16]. Finally, we aim to launch a public web service via http://webgraph-it.com to provide users with web crawling and webgraph generation services.…”
Section: Discussionmentioning
confidence: 99%
“…Debian linux operating system [118] for development and production servers, 2. Nginx web server 16 to server static web content, The home page of the ArchiveReady system is presented in Figure 3.5. An overview of the system architecture is presented in Figure 3.4.…”
Section: System Architecturementioning
confidence: 99%
See 3 more Smart Citations