Knowledge-intensive applications pose new challenges to metadata management, including distribution, access control, uniformity of access, and evolution in time. The authors identify general requirements for metadata management and describe a simple model and service that focuses on Resource Description Framework (RDF) metadata to address these requirements.
In important application domains, data and service providers are increasingly making their resources publicly available to the community for re-use in complex workflows. To make those resources useful in practice, however, providers must also provide annotations that describe the data and services' nature and function. This is the case for e-science, the realm of in silico experiments or "procedures that use computer-based information repositories and computational analysis tools to test a hypothesis, derive a summary, search for patterns, or demonstrate a known fact."1 In e-science, it's quite common for providers and consumers to independently add annotations to resources to facilitate their discovery or to record details of their use as part of an experiment. Thus, when scaled to hundreds or thousands of resources and users of those resources, the annotations themselves will form a new and large corpus of heterogeneous metadata distributed over many organizations, with no central control over its maintenance. As a new and complex type of data resource, such metadata requires some form of management to be of any practical use.In this article, we present a middleware service for metadata management that addresses the issue. Its design is based on the observation that, regardless of their differences in format and content, two simple properties are common to all metadata: namely, that it's invariably associated to some underlying resource, and, optionally, separate meta-information for interpreting metadata -an ontology, for instance -might be available. We refer to such meta-information as a knowledge entity to underline the fact that it's used to interpret the metadata. This is
A Service-Oriented Approach to Metadata ManagementTo illustrate the need for annotations, let's take a look at the myGrid infrastructure for e-Science (www.mygrid.org.uk). myGrid offers a middleware services suite to facilitate the specification of in silico experiments, mainly in the bioinformatics domain. 2 In particular, biologists can compose and execute scientific workflows like the one in Figure 1 (the workflow is part of a myGrid collection, available at http://workflows.mygrid.org.uk/repository/myGrid) that orchestrate access to multiple resources -public databases and data analysis tools, for example -and use them to derive biologically significant results. We can view myGrid workflows, specified using the Taverna workflow model, 3 as a composition of Web services. Hundreds of Taverna services for biology, for example, are currently available through bioinformatics domain experts' contributions. Service providers can create several types of annotations to facilitate the ...