Vangelis Banos scite author profile

Social media content and user participation has increased dramatically since the advent of Web 2.0. Blogs have become relevant to every aspect of business and personal life. Nevertheless, we do not have the right tools to aggregate and preserve blog content correctly, as well as to manage blog archives effectively. Given the rising importance of blogs, it is crucial to build systems to facilitate blog preservation, safeguarding an essential part of our heritage that will prove valuable for current and future generations. In this paper, we present our work in progress towards building a novel blog preservation platform featuring robust digital preservation, management and dissemination facilities for blogs. This work is part of the BlogForever project which is aiming to make an impact to the theory and practice of blog preservation by creating guidelines and software that any individual or organization could use to preserve their blogs.

show abstract

A Scalable Approach to Harvest Modern Weblogs

Banos

Blanvillain

Kasioumis

et al. 2015

Int. J. Artif. Intell. Tools

View full text Add to dashboard Cite

Blogs are one of the most prominent means of communication on the web. Their content, interconnections and influence constitute a unique socio-technical artefact of our times which needs to be preserved. The BlogForever project has established best practices and developed an innovative system to harvest, preserve, manage and reuse blog content. This paper presents the latest developments of the blog crawler which is a key component of the BlogForever platform. More precisely, our work concentrates on techniques to automatically extract content such as articles, authors, dates and comments from blog posts. To achieve this goal, we introduce a simple yet robust and scalable algorithm to generate extraction rules based on string matching using the blog's web feed in conjunction with blog hypertext. Furthermore, we present a system architecture which is characterised by efficiency, modularity, scalability and interoperability with third-party systems. Finally, we conduct thorough evaluations of the performance and accuracy of our system.

show abstract

Ensuring the quality and interoperability of open cultural digital content: System architecture and scalability

Georgiadis

Banos

Stathopoulou

et al. 2014

View full text Add to dashboard Cite

Implementing Enhanced OAI-PMH Requirements for Europeana

Houssos

Stamatis

Banos

et al. 2011

View full text Add to dashboard Cite

Abstract. Europeana has put in a stretch many known procedures in digital libraries, imposing requirements difficult to be implemented in many small institutions, often without dedicated systems support personnel. Although there are freely available open source software platforms that provide most of the commonly needed functionality such as OAI-PMH support, the migration from legacy software may not be easy, possible or desired. Furthermore, advanced requirements like selective harvesting according to complex criteria are not widely supported. To accommodate these needs and help institutions contribute their content to Europeana, we developed a series of tools. For the majority of small content providers that are running DSpace, we developed a DSpace plugin, to convert and augment the Dublin Core metadata according to Europeana ESE requirements. For sites with different software, incompatible with OAI-PMH, we developed wrappers enabling repeatable generation and harvesting of ESE-compatible metadata via OAI-PMH. In both cases, the system is able to select and harvest only the desired metadata records, according to a variety of configuration criteria of arbitrary complexity. We applied our tools to providers with sophisticated needs, and present the benefits they achieved.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Vangelis Banos

A quantitative approach to evaluate Website Archivability using the CLEAR+ method

Towards building a blog preservation platform

A Scalable Approach to Harvest Modern Weblogs

Ensuring the quality and interoperability of open cultural digital content: System architecture and scalability

Implementing Enhanced OAI-PMH Requirements for Europeana

Contact Info

Product

Resources

About