We investigated how participation evolves in Wikidata as its editors become established members of the community. Originally conceived to support Wikipedia, Wikidata is a collaborative structured knowledge base, created and maintained by a large number of volunteers, whose data can be freely reused in other contexts. Just like in any other online social environment, understanding its contributors' pathways to full participation helps Wikidata improve user experience and retention. We analysed how participation changes in time under the frameworks of legitimate peripheral participation and activity theory. We found out that as they engage more with the project, "Wikidatians" acquire a higher sense of responsibility for their work, interact more with the community, take on more advanced tasks, and use a wider range of tools. Previous activity in Wikipedia has varied effects. As Wikidata is a young community, future work should focus on volunteers with little or no experience in similar projects and specify means to improve critical aspects such as engagement and data quality.
Abstract. Wikidata is a collaboratively-edited knowledge graph; it expresses knowledge in the form of subject-property-value triples, which can be enhanced with references to add provenance information. Understanding the quality of Wikidata is key to its widespread adoption as a knowledge resource. We analyse one aspect of Wikidata quality, provenance, in terms of relevance and authoritativeness of its external references. We follow a two-staged approach. First, we perform a crowdsourced evaluation of references. Second, we use the judgements collected in the first stage to train a machine learning model to predict reference quality on a large-scale. The features chosen for the models were related to reference editing and the semantics of the triples they referred to. 61% of the references evaluated were relevant and authoritative. Bad references were often links that changed and either stopped working or pointed to other pages. The machine learning models outperformed the baseline and were able to accurately predict non-relevant and nonauthoritative references. Further work should focus on implementing our approach in Wikidata to help editors find bad references.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.