In a large variety of applications, the long-term, guaranteed availability of data is becoming increasingly important. Thus, long-term digital preservation systems have to be inherently distributed to allow content to be replicated. This affects both the preservation of the actual digital objects and their associated metadata. For the latter, RDF has become the prevalent data model. Ensuring data integrity and consistency requires periodic checks to timely detect inconsistencies, for instance due to (partial) hardware failures, and trigger repair actions. Hence, the access characteristics to metadata in longterm digital preservation significantly differs from metadata management in other types of applications. In addition, the increasing size of digital archives challenges the consistency checks of the associated metadata. In this paper, we introduce a novel benchmark for triple store-based metadata management that jointly takes into account the specific access patterns of long-term preservation systems: i.) complex periodic consistency checks, ii.) concurrent read and write requests to the archive, and iii.) the actions to be taken on data to re-establish consistency if a violation has been detected. Furthermore, we present the results of this benchmark applied to our distributed long-term digital preservation system DISTARNET.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.