Summary
We are in the midst of a scientific data explosion in which the rate of data growth is rapidly increasing. While large‐scale research projects have developed sophisticated data distribution networks to share their data with researchers globally, there is no such support for the many millions of research projects generating data of interest to much smaller audiences (as exemplified by the long tail scientist). In data‐oriented research, every aspect of the research process is influenced by data access. However, sharing and accessing data efficiently as well as lowering access barriers are difficult. In the absence of dedicated large‐scale storage, many have noted that there is an enormous storage capacity available via connected peers, none more so than the storage resources of many research groups. With widespread usage of the content delivery network model for disseminating web content, we believe a similar model can be applied to distributing, sharing, and accessing long tail research data in an e‐Science context. We describe the vision and architecture of a social content delivery network – a model that leverages the social networks of researchers to automatically share and replicate data on peers' resources based upon shared interests and trust. Using this model, we describe a simulator and investigate how aspects such as user activity, geographic distribution, trust, and replica selection algorithms affect data access and storage performance. From these results, we show that socially informed replication strategies are comparable with more general strategies in terms of availability and outperform them in terms of spatial efficiency. Copyright © 2016 John Wiley & Sons, Ltd.