2022
DOI: 10.21203/rs.3.rs-1506561/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Computational Metadata Generation Methods for Biological Specimen Image Collections

Abstract: Metadata is a key data source for researchers seeking to apply machine learning (ML) to the vast collections of digitized biological specimens that can be found online. Unfortunately, the available metadata is often sparse and, at times, erroneous. This paper extends previous research with the Illinois Natural History Survey (INHS) collection (7,244 specimen images) using computational approaches to analyze image quality, and then automatically generate 22 metadata properties representing the image quality and… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 0 publications
0
1
0
Order By: Relevance
“…The first ML component (Figures 1 and 2, step 4a) performs object detection and metadata generation as defined by the rule generate_metadata from the BGNN_Core_Workflow (Tabarin, Bradley, Balk, & Lapp, 2023a; Table 1). This rule calls the container drexel_metadata from the drexel_metadata repository (Karnani et al., 2023; Table 1), which generates a metadata file for each image datum (named ARKID.json). The codebase has two outputs and is described in Leipzig et al.…”
Section: Methodsmentioning
confidence: 99%
“…The first ML component (Figures 1 and 2, step 4a) performs object detection and metadata generation as defined by the rule generate_metadata from the BGNN_Core_Workflow (Tabarin, Bradley, Balk, & Lapp, 2023a; Table 1). This rule calls the container drexel_metadata from the drexel_metadata repository (Karnani et al., 2023; Table 1), which generates a metadata file for each image datum (named ARKID.json). The codebase has two outputs and is described in Leipzig et al.…”
Section: Methodsmentioning
confidence: 99%