In recent time, the idea of collecting and combining large public data sets and services became more and more popular
A Universal Storage based on DHTsAn increasing number of applications on the Web are based on the idea of collecting and combining large public data sets and services. In such public data management scenarios, the information, its structure, and its semantics are controlled by a large number of participants. Despite being distributed or decentralized in respect to data from a conceptual point of view, the supporting infrastructures of these systems still are based on inherently centralized concepts. The downsides at the physical layer of such centralized systems, such as bottlenecks, single-point-of-failures and enormous costs for providing the needed resources, are extended by problems on a more logical level, e.g., the problem of integrating data/services and the need of database processing functionality. Examples of such applications include (specialized) Web search engines, scientific database applications, naming or directory services and "social" applications such as file/picture sharing, encyclopedias, friend-ofa-friend networks or recommender systems.In this paper, we argue for a decentralization of data management by creating a universal distributed storage for such public data/metadata, which exploits the gigantic storage and processing capacity of the worldwide available Internet nodes in the same way as the network layer exploits the worldwide communication devices for routing messages between nodes. Information sources are highly distributed, data is described according to heterogeneous schemas, no participant has a global view of all information, and data and service quality can only be guaranteed in a best effort way. In this context, the global challenge is to develop a light-weight, generic data management component playing the same role as the TCP/IP stack and a highly scalable infrastructure enforcing a fair distribution of storage and processing load in a highly dynamic world without any central control.For such type of public information management, DHTbased overlay systems offer an interesting alternative to existing information system architectures. While problems like scalability, robustness and fair balance of load and work are covered by modern DHTs, new research problems have to be addressed, the most prominent being: Data may exist in a large number of different schema organizations and expressiveness of queries and possible guarantees (existence, completeness, etc.) are limited at the moment.Concerning a distributed universal storage as we propose, the key issues can be classified along three questions: (1) How to structure and organize data in massively distributed settings? (2) How to query data and how to query efficiently? (3) What is needed to get a robust and practical solution?The first question raises two main problems: We need a generic and flexible schema for structuring data and we have to deal with heterogeneities on schema and on data level. The ...