Institutions, funding bodies, and national research organizations are pushing for more data sharing and FAIR data. Institutions typically implement data policies, frequently supported by an institutional data repository. Funders
typically mandate data sharing. So where does this leave the researcher? How can researchers benefit from doing the additional work to share their data?
In order to make sure that researchers and institutions get credit for sharing their data, the data needs to be tracked and attributed first. In this paper we investigated where the research data ended up for 11 research institutions, and how this data is currently tracked and attributed. Furthermore, we also analysed the gap between the research data that is currently in institutional repositories, and where their researchers truly share their data.
We found that 10 out of 11 institutions have most of their public research data hosted outside of their own institution. Combined, they have 12% of their institutional research data published in the institutional data repositories. According to our data, the typical institution had 5% of their research data (median) published in the institutional repository, but there were 4 universities for which it was 10% or higher.
By combining existing data-to-article graphs with existing article-to- researcher and article-to-institution graphs it becomes possible to increase tracking of public research data and therefore the visibility of researchers sharing their data typically by 17x. The tracking algorithm that was used to perform analysis and report on potential improvements has subsequently been implemented as a standard method in the Mendeley Data Monitor product. The improvement is most likely an under-estimate because, while the recall for datasets in institutional repositories is 100%, that is not the case for datasets published outside the institutions, so there are even more datasets still to be discovered.