2019
DOI: 10.1371/journal.pbio.3000269
|View full text |Cite
|
Sign up to set email alerts
|

Rxivist.org: Sorting biology preprints using social media and readership metrics

Abstract: Preprints have arrived. In increasing numbers, researchers across the life sciences are embracing the once-niche practice, shaking off decades of reluctance and posting hundreds of papers per week to preprint servers, sharing their findings with the community before embarking on the weary march through peer review. However, there are limited methods for individuals sifting through this avalanche of research to identify the preprints that are most relevant to their interests. Here, we describe Rxivi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
1
1

Relationship

2
4

Authors

Journals

citations
Cited by 10 publications
(9 citation statements)
references
References 16 publications
0
9
0
Order By: Relevance
“…We chose to focus on bioRxiv for several reasons: primarily, bioRxiv is the preprint server most broadly integrated into the traditional publishing system ( Barsh et al, 2016 ; Vence, 2017 ; eLife, 2020 ). In addition, bioRxiv currently holds the largest collection of biology preprints, with metadata available in a format we were already equipped to ingest ( Abdill and Blekhman, 2019c ). Analyzing data from only a single repository also avoids the issue of different websites holding metadata that is mismatched or collected in different ways.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…We chose to focus on bioRxiv for several reasons: primarily, bioRxiv is the preprint server most broadly integrated into the traditional publishing system ( Barsh et al, 2016 ; Vence, 2017 ; eLife, 2020 ). In addition, bioRxiv currently holds the largest collection of biology preprints, with metadata available in a format we were already equipped to ingest ( Abdill and Blekhman, 2019c ). Analyzing data from only a single repository also avoids the issue of different websites holding metadata that is mismatched or collected in different ways.…”
Section: Discussionmentioning
confidence: 99%
“…We used existing data from the Rxivist web crawler ( Abdill and Blekhman, 2019c ) to build a list of URLs for every preprint on bioRxiv.org . We then used this list as the input for a new tool that collects author data: we recorded a separate entry for each author of each preprint, and stored name, email address, affiliation, ORCID identifier, and the date of the most recent version of the preprint that has been indexed in the Rxivist database.…”
Section: Methodsmentioning
confidence: 99%
“…We chose to focus on bioRxiv for several reasons: Primarily, bioRxiv is the preprint server most broadly integrated into the traditional publishing system (see Introduction ) (Barsh et al 2016; Vence 2017; Eisen 2019). In addition, bioRxiv currently holds the largest collection of biology preprints, with metadata available in a format we were already equipped to ingest (Abdill and Blekhman 2019c). Analyzing data from only a single repository also avoids the issue of different websites holding metadata that is mismatched or collected in different ways.…”
Section: Discussionmentioning
confidence: 99%
“…Rxivist [41] is a Python-based web crawler that parses the bioRxiv website, detects newly posted preprints, and stores metadata about each item in a PostgreSQL database. The metadata we extracted contained title, authors, submission date, category, DOI for preprint and, if published, the new DOI and the journal of publication.…”
Section: Terminologymentioning
confidence: 99%
“…When a preprint is published in a peer-review journal, a reference to the new DOI of the journal article appears next to its title, and DOIs of a preprint and a published article are permanently linked in indexing platforms and tools, which pull from various APIs. Rxivist [41] showed to be an excellent tool for extracting published DOIs for preprints eventually appearing as peer-reviewed journal articles but only when bioRxiv records linked preprints to their external publications. Rxivist also had a two weeks delay in updating its metrics, and it might be of this delay that some peer-reviewed preprint analogues were missing from Rxivist.…”
Section: Data Challenges and Study Limitationsmentioning
confidence: 99%