Taxonomic names remain fundamental to linking biodiversity data, but information on these names resides in separate silos. Despite often making their contents available in RDF, records in these taxonomic databases are rarely linked to identifiers in external databases, such as DOIs for publications, or ORCIDs for people. This paper explores how author names in publication databases such as CrossRef and ORCID can be reconciled with author names in a taxonomic database using existing vocabularies and SPARQL queries.
Linking taxonomic namesWe can represent "core" biodiversity data as a network of connected entities, such as taxa and their names, publications, people, species, macromolecular sequences, images, and natural history collections [9]. Creating a "biodiversity knowledge graph" is an implicit goal of several initiatives in biodiversity informatics. Indeed, taxonomic databases were early adopters of the Resource Description Framework (RDF) for describing entities and their interrelationships. From 2005 onwards, major databases of taxonomic names ("nomenclators") for plants, animals, and fungi have used Life Science Identifiers (LSIDs) [2] to uniquely identify those names. LSIDs can be dereferenced to return metadata in RDF [8], and several databases used the same vocabulary (developed by TDWG) to encode information about taxonomic names, their status (e.g., were the names in current use), and where the names were published. The use of globally unique identifiers that can be dereferenced, and which return data in a consistent, machine-readable format would seem to satisfy the preconditions for creating biodiversity knowledge graph [9].Despite the obvious desirability of linking biodiversity data together ([1]), the biodiversity knowledge graph as yet to spontaneously assemble itself. Arguably the biggest reason is that there were few, if any, connections between taxonomic information and external data sources. For example, taxonomic databases typically cite the taxonomic literature using text strings, rather than persistent identifiers. Hence, we still have silos, albeit silos available in linked data formats.