Purpose
– The purpose of this paper is to provide much needed data to staff working with archival digitization on cost and benefit of visual checks during quality control workflows, and to encourage those in the field of digitization to take a data-driven approach to planning and workflow development as they transition into large-scale digitization.
Design/methodology/approach
– This is a case study of a cost benefit analysis at the Triangle Research Libraries Network. Data were tracked on time spent performing visual checks compared to scanning production and error type/discovery rates for the consortial grant “Content, context, and capacity: a collaborative large-scale digitization project on the long civil rights movement in North Carolina”.
Findings
– Findings show that 85 percent of time was spent scanning and 15 percent was spent on quality control with visual checks of every scan. Only one error was discovered for every 223 scans reviewed (0.4 percent of scans). Of the six types of error identified, only half cause critical user experience issues. Of all errors detected, only 32 percent fell into the critical category. One critical error was found for every 700 scans (0.1 percent of scans). If all the time spent performing visual checks were instead spent on scanning, production would have increased by 18 percent. Folders with 100 or more scans comprised only 11.5 percent of all folders and 37 percent of folders in this group contained errors (for comparison, only 8 percent of folders with 50 or more scans contained errors). Additionally, 52 percent of all critical errors occurred in these folders. The errors in larger folders represented 30 percent of total errors, and performing visual checks on the large folders required 32 percent of all visual check time.
Practical implications
– The data gathered during this research can be repurposed by others wishing to consider or conduct cost benefit analysis of visual check workflows for large-scale digitization.
Originality/value
– To the authors' knowledge, this is the only available dataset on rate of error detection and error type compared to time spent on quality control visual checks in digitization.