Do only major scientific breakthroughs hit the news and social media, or does a 'catchy' title help to attract public attention? How strong is the connection between the importance of a scientific paper and the (social) media attention it receives? In this study we investigate these questions by analysing the relationship between the observed attention and certain characteristics of scientific papers from two major multidisciplinary journals: Nature Communication (NC) and Proceedings of the National Academy of Sciences (PNAS). We describe papers by features based on the linguistic properties of their titles and centrality measures of their authors in their co-authorship network. We identify linguistic features and collaboration patterns that might be indicators for future attention, and are characteristic to different journals, research disciplines, and media sources.
Previous research has shown the existence of gender biases in the depiction of professions and occupations in search engine results. Such an unbalanced presentation might just as likely occur on Wikipedia, one of the most popular knowledge resources on the Web, since the encyclopedia has already been found to exhibit such tendencies in past studies. Under this premise, our work assesses gender bias with respect to the content of German Wikipedia articles about professions and occupations along three dimensions: used male vs. female titles (and redirects), included images of persons, and names of professionals mentioned in the articles. We further use German labor market data to assess the potential misrepresentation of a gender for each specific profession. Our findings in fact provide evidence for systematic over-representation of men on all three dimensions. For instance, for professional fields dominated by females, the respective articles on average still feature almost two times more images of men; and in the mean, 83% of the mentioned names of professionals were male and only 17% female.
With this work, we present a publicly available dataset of the history of all the references (more than 55 million) ever used in the English Wikipedia until June 2019. We have applied a new method for identifying and monitoring references in Wikipedia, so that for each reference we can provide data about associated actions: creation, modifications, deletions, and reinsertions. The high accuracy of this method and the resulting dataset was confirmed via a comprehensive crowdworker labelling campaign. We use the dataset to study the temporal evolution of Wikipedia references as well as users’ editing behaviour. We find evidence of a mostly productive and continuous effort to improve the quality of references: (1) there is a persistent increase of reference and document identifiers (DOI, PubMedID, PMC, ISBN, ISSN, ArXiv ID), and (2) most of the reference curation work is done by registered humans (not bots or anonymous editors). We conclude that the evolution of Wikipedia references, including the dynamics of the community processes that tend to them should be leveraged in the design of relevance indexes for altmetrics, and our dataset can be pivotal for such an effort.
Peer Review
https://publons.com/publon/10.1162/qss_a_00171
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.