The study of science at the individual scholar level requires the disambiguation of author names. The creation of author’s publication oeuvres involves matching the list of unique author names to names used in publication databases. Despite recent progress in the development of unique author identifiers, e.g., ORCID, VIVO, or DAI, author disambiguation remains a key problem when it comes to large-scale bibliometric analysis using data from multiple databases. This study introduces and tests a new methodology called seed + expand for semi-automatic bibliographic data collection for a given set of individual authors. Specifically, we identify the oeuvre of a set of Dutch full professors during the period 1980–2011. In particular, we combine author records from a Dutch National Research Information System (NARCIS) with publication records from the Web of Science. Starting with an initial list of 8,378 names, we identify ‘seed publications’ for each author using five different approaches. Subsequently, we ‘expand’ the set of publications in three different approaches. The different approaches are compared and resulting oeuvres are evaluated on precision and recall using a ‘gold standard’ dataset of authors for which verified publications in the period 2001–2010 are available.
The work outlines the development of a data curation and data provenance framework in the EUDAT Collaborative Data Infrastructure. Practical use cases are described, as well as results of defining and implementing data curation policies and data provenance patterns.
The web does not only enable new forms of science, it also creates new possibilities to study science and new digital scholarship. This paper brings together multiple perspectives: from individual researchers seeking the best options to display their activities and market their skills on the academic job market; to academic institutions, national funding agencies, and countries needing to monitor the science system and account for public money spending. We also address the research interests aimed at better understanding the selforganising and complex nature of the science system through researcher tracing, the identification of the emergence of new fields, and knowledge discovery using large-data mining and non-linear dynamics. In particular this paper draws attention to the need for standardisation and data interoperability in the area of research information as an indispensable precondition for any science modelling. We discuss which levels of complexity are needed to provide a globally, interoperable, and expressive data infrastructure for research information. With possible dynamic science model applications in mind, we introduce the need for a "middle-range" level of complexity for data representation and propose a conceptual model for research data based on a core international ontology with national and local extensions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.