Felix Bensmann scite author profile

Science across all disciplines has become increasingly data-driven, leading to additional needs with respect to software for collecting, processing and analysing data. Thus, transparency about software used as part of the scientific process is crucial to understand provenance of individual research data and insights, is a prerequisite for reproducibility and can enable macro-analysis of the evolution of scientific methods over time. However, missing rigor in software citation practices renders the automated detection and disambiguation of software mentions a challenging problem. In this work, we provide a large-scale analysis of software usage and citation practices facilitated through an unprecedented knowledge graph of software mentions and affiliated metadata generated through supervised information extraction models trained on a unique gold standard corpus and applied to more than 3 million scientific articles. Our information extraction approach distinguishes different types of software and mentions, disambiguates mentions and outperforms the state-of-the-art significantly, leading to the most comprehensive corpus of 11.8 M software mentions that are described through a knowledge graph consisting of more than 300 M triples. Our analysis provides insights into the evolution of software usage and citation patterns across various fields, ranks of journals, and impact of publications. Whereas, to the best of our knowledge, this is the most comprehensive analysis of software use and citation at the time, all data and models are shared publicly to facilitate further research into scientific use and citation of software.

show abstract

An Infrastructure for Spatial Linking of Survey Data

Bensmann

Heling

Jünger

et al. 2020

View full text Add to dashboard Cite

Research on environmental justice comprises health and well-being aspects, as well as topics related to general social participation. In this research field, among others, there is a need for an integrated use of social science survey data and spatial science data, e.g. for combining demographic information from survey data with data on pollution from spatial data. However, for researchers it is challenging to link both data sources, because (1) the interdisciplinary nature of both data sources is different, (2) both underlie different legal restrictions, in particular regarding data privacy, and (3) methodological challenges arise regarding the use of geo-information systems (GIS) for the processing and analysis of spatial data.In this article, we present an infrastructure of distributed web services which supports researchers in the process of spatial linking. The infrastructure addresses the challenges researchers have to face during that process. We present an example case study on the investigation of environmental inequalities with regards to income and land use hazards in Germany by using georeferenced survey data of the GESIS Panel and the German Socio-economic Panel (SOEP), and by using spatial data from the Monitor of Settlement and Open Space Development (IOER Monitor). The results show that increasing income of survey respondents is associated with less exposure to land-use-related environmental hazards in Germany.

show abstract

The RichWPS Environment for Orchestration

Bensmann¹,

Alcacer-Labrador²,

Ziegenhagen³

et al. 2014

IJGI

View full text Add to dashboard Cite

Web service (WS) orchestration can be considered as a fundamental concept in service-oriented architectures (SOA), as well as in spatial data infrastructures (SDI). In recent years in SOA, advanced solutions were developed, such as realizing orchestrated web services on the basis of already existing more fine-granular web services by using standardized notations and existing orchestration engines. Even if the concepts can be mapped to the field of SDI, on a conceptual level the implementations target different goals. As a specialized form of a common web service, an Open Geospatial Consortium (OGC) web service (OWS) is optimized for a specific purpose. On the technological level, web services depend on standards like the Web Service Description Language (WSDL) or the Simple Object Access Protocol (SOAP). However OWS are different. Consequently, a new concept for OWS orchestration is needed that works on the interface provided by OWS. Such a concept is presented in this work. The major component is an orchestration engine integrated in a Web Processing Service (WPS) server that uses a domain specific language (DSL) for workflow description. The developed concept is the base for the realization of new functionality, such as workflow testing, and workflow optimization.

show abstract

Interlinking Large-scale Library Data with Authority Records

Bensmann

Zapilko

Mayr

2017

Front. Digit. Humanit.

View full text Add to dashboard Cite

In the area of Linked Open Data (LOD), meaningful and high-performance interlinking of different datasets has become an ongoing challenge. Necessary tasks are supported by established standards and software, e.g., for the transformation, storage, interlinking, and publication of data. Our use case Swissbib is a well-known provider for bibliographic data in Switzerland representing various libraries and library networks. In this article, a case study is presented from the project linked.swissbib.ch which focuses on the preparation and publication of the Swissbib data by means of LOD. Data available in Marc21 XML are extracted from the Swissbib system and transformed into an RDF/XML representation. From approximately 21 million monolithic records, the author information is extracted and interlinked with authority files from the Virtual International Authority File (VIAF) and DBpedia. The links are used to extract additional data from the counterpart corpora. Afterward, data are pushed into an Elasticsearch index to make the data accessible for other components. As a demonstrator, a search portal is developed which presents the additional data and the generated links to users. In addition to that, a REST interface is developed in order to enable also access by other applications. A main obstacle in this project is the amount of data and the necessity of day-to-day (partial) updates. In the current situation, the data in Swissbib and in the external corpora are too large to be processed by established linking tools. The arising memory footprint prevents the correct functioning of these tools. Also triple stores are unhandy by revealing a massive overhead for import and update operations. Hence, we have developed procedures for extracting and shaping the data into a more suitable form, e.g., data are reduced to the necessary properties and blocked. For this purpose, we used sorted N-Triples as an intermediate data format. This method proved to be very promising as our preliminary results show. Our approach could establish 30,773 links to DBpedia and 20,714 links to VIAF and both link sets show high precision values and could be generated in reasonable expenditures of time.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Felix Bensmann

SoMeSci- A 5 Star Open Data Gold Standard Knowledge Graph of Software Mentions in Scientific Articles

The role of software in science: a knowledge graph-based analysis of software mentions in PubMed Central

An Infrastructure for Spatial Linking of Survey Data

The RichWPS Environment for Orchestration

Interlinking Large-scale Library Data with Authority Records

Contact Info

Product

Resources

About