2014
DOI: 10.3897/phytokeys.38.7168
|View full text |Cite
|
Sign up to set email alerts
|

The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels

Abstract: At the Royal Botanic Garden Edinburgh (RBGE) the use of Optical Character Recognition (OCR) to aid the digitisation process has been investigated. This was tested using a herbarium specimen digitisation process with two stages of data entry. Records were initially batch-processed to add data extracted from the OCR text prior to being sorted based on Collector and/or Country. Using images of the specimens, a team of six digitisers then added data to the specimen records. To investigate whether the data from OCR… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
20
0
1

Year Published

2014
2014
2024
2024

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 29 publications
(21 citation statements)
references
References 15 publications
0
20
0
1
Order By: Relevance
“…Optical character recognition may help sorting entries by collector or country (Drinkwater et al, 2014), as might the development of semi-automated imaging pipelines (Tegelberg et al, 2014). Other projects use citizen science approaches to transcribe specimen labels ((Hill et al, 2012); https://www.…”
Section: Box 3 Digitization Challengementioning
confidence: 99%
“…Optical character recognition may help sorting entries by collector or country (Drinkwater et al, 2014), as might the development of semi-automated imaging pipelines (Tegelberg et al, 2014). Other projects use citizen science approaches to transcribe specimen labels ((Hill et al, 2012); https://www.…”
Section: Box 3 Digitization Challengementioning
confidence: 99%
“…In particular, our approach involved a two-step process of automated data scraping followed by curation by hand and quality assurance. Overall, we found that OCR was an efficient method for reducing the labor associated with transcribing analog text records (e.g., Drinkwater et al, 2014). Unfortunately, OCR technology does not have absolute accuracy.…”
Section: Discussionmentioning
confidence: 99%
“…Considering that the ideal is to digitise collections once, it is critical that the digitisation process does not limit the eventual uses of the images. Potential applications include basic ones, such as determining the identity of the specimen and reading the label details but they might also include automated extraction of character traits using pattern recognition or information extraction from labels and annotations through optical character recognition (Corney et al 2012, Drinkwater et al 2014, Corney et al 2018. More advanced uses of images demand higher quality images and the increased usefulness must be balanced against the additional costs of capture and storage.…”
Section: Introductionmentioning
confidence: 99%