10 Simple Rules for Funding Scientific Open Source Software

Strasser, Carly; Hertweck, Kate; Greenberg, Josh; Dario, Taraborelli,; Vu, Elizabeth

doi:10.5281/zenodo.6611500

Cited by 2 publications

(1 citation statement)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Not being able to measure the impact of critical software tools that enable scientific progress makes it hard for their authors and maintainers to pursue scientific careers and to obtain funding for their work [6,7,8,9,10]. Furthermore, it makes it more difficult for other scientists to reproduce results in scientific papers, and creates barriers for funders who need to objectively evaluate the impact of their support [9,11,12,13,14].…”

Section: Introductionmentioning

confidence: 99%

A large dataset of software mentions in the biomedical literature

Istrate¹,

Li²,

Taraborelli³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

We describe the CZ Software Mentions dataset, a new dataset of software mentions in biomedical papers. Plain-text software mentions are extracted with a trained SciBERT model from several sources: the NIH PubMed Central collection and from papers provided by various publishers to the Chan Zuckerberg Initiative. The dataset provides sources, context and metadata, and, for a number of mentions, the disambiguated software entities and links. We extract 1.12 million unique string software mentions from 2.4 million papers in the NIH PMC-OA Commercial subset, 481k unique mentions from the NIH PMC-OA Non-Commercial subset (both gathered in October 2021) and 934k unique mentions from 3 million papers in the Publishers' collection. There is variation in how software is mentioned in papers and extracted by the NER algorithm. We propose a clustering-based disambiguation algorithm to map plain-text software mentions into distinct software entities and apply it on the NIH PubMed Central Commercial collection. Through this methodology, we disambiguate 1.12 million unique strings extracted by the NER model into 97,600 unique software entities, covering 78% of all software-paper links. We link 185,000 of the mentions to a repository, covering about 55% of all software-paper links. We describe in detail the process of building the datasets, disambiguating and linking the software mentions, as well as opportunities and challenges that come with a dataset of this size. We make all data and code publicly available as a new resource to help assess the impact of software (in particular scientific open source projects) on science. Contents

show abstract

Section: Introductionmentioning

confidence: 99%

A large dataset of software mentions in the biomedical literature

Istrate¹,

Li²,

Taraborelli³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

Research software is critical to the future of AI-driven research

2024

Preprint

View full text Add to dashboard Cite

<em> Image was created with the assistance of AI. </em> By Michelle Barker, Kim Hartley, Daniel S. Katz, Richard Littauer, Qian Zhang, Shurui Zhou, Jyoti Bhogal August 2024 [This blog post has been cross-posted by the Netherlands eScience Center, Software Sustainability Institute, and US-RSE.] Abstract This position paper provides a statement on the criticality of research software in artificial intelligence (AI)-driven research and makes

show abstract

10 Simple Rules for Funding Scientific Open Source Software

Cited by 2 publications

References 0 publications

A large dataset of software mentions in the biomedical literature

A large dataset of software mentions in the biomedical literature

Research software is critical to the future of AI-driven research

Contact Info

Product

Resources

About