2020
DOI: 10.3897/rio.6.e57602
|View full text |Cite
|
Sign up to set email alerts
|

Landscape Analysis for the Specimen Data Refinery

Abstract: This report reviews the current state-of-the-art applied approaches on automated tools, services and workflows for extracting information from images of natural history specimens and their labels. We consider the potential for repurposing existing tools, including workflow management systems; and areas where more development is required. This paper was written as part of the SYNTHESYS+ project for software development teams and informatics teams working on new software-based approaches to improve mass digitisa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7

Relationship

4
3

Authors

Journals

citations
Cited by 21 publications
(11 citation statements)
references
References 43 publications
0
11
0
Order By: Relevance
“…This Level 1 workflow provenance [68] can be expressed generally across workflow languages with minimal workflow engine changes, with the option of more detailed provenance traces as separate PROV artefacts in the RO-Crate as data entities. In the current development of Specimen Data Refinery 59 [122] these RO-Crates will document the text recognition workflow runs of digitised biological specimens, exposed as FAIR Digital Objects [38].…”
Section: Profile For Describing Workflowsmentioning
confidence: 99%
“…This Level 1 workflow provenance [68] can be expressed generally across workflow languages with minimal workflow engine changes, with the option of more detailed provenance traces as separate PROV artefacts in the RO-Crate as data entities. In the current development of Specimen Data Refinery 59 [122] these RO-Crates will document the text recognition workflow runs of digitised biological specimens, exposed as FAIR Digital Objects [38].…”
Section: Profile For Describing Workflowsmentioning
confidence: 99%
“…Tools of particular interest span the fields of computer vision, optical character recognition, handwriting recognition, named entity recognition and language translation. Workflow technologies from the ELIXIR Research Infrastructure [19], including Galaxy [20], Common Workflow Language [21], Research Object Crates (RO-Crates) [22,23] and WorkflowHub [24], and selected tools are integrated in a cloud-based workflow platform for natural history specimens-the 'Specimen Data Refinery' [1] that will become one of the main services to be offered by the planned DiSSCo research infrastructure [5]. The tools themselves, implemented with findable, accessible, interoperable, and reusable (FAIR) characteristics [25] are packaged into canonical workflow component libraries [26], rendering them reusable, and interoperable with one another.…”
Section: The Specimen Data Refi Nery: a Canonical Workfl Ow Framework...mentioning
confidence: 99%
“…While natural history collections are heterogeneous in size and shape, often they are mass digitized using standardised workflows [9,10,11,12,13]. In pursuit of higher throughput at lower cost, yet with higher accuracy and richer metadata, further automation will increasingly rely on techniques of object detection and segmentation, optical character recognition (OCR) and semantic processing of labels, and automated taxonomic identification and visual feature analysis [1,18].…”
Section: Wo Rkflows For Processing Specimen Images and Extracting Datamentioning
confidence: 99%
See 1 more Smart Citation
“…The terms robotics, robot and automation are also now frequently applied to process automation using software, without any mechanical or physical components. Software automation is outside the scope of this paper, as it is addressed in related reports including the ICEDIG report on automated text digitisation (Owen et al 2020), and the SYNTHESYS+ Specimen Data Refinery (Walton et al 2020). Software automation is already in use across many aspects of digitisation, including image processing, batch quality control, barcode detection and image segmentation (Allan et al 2019, Hudson et al 2015.…”
Section: Definitions and Scopementioning
confidence: 99%