2019
DOI: 10.3389/fdigh.2019.00004
|View full text |Cite
|
Sign up to set email alerts
|

Index-Driven Digitization and Indexation of Historical Archives

Abstract: The promise of digitization of historical archives lies in their indexation at the level of contents. Unfortunately, this kind of indexation does not scale, if done manually. In this article we present a method to bootstrap the deployment of a content-based information system for digitized historical archives, relying on historical indexing tools. Commonly prepared to search within homogeneous records when the archive was still current, such indexes were as widespread as they were disconnected, that is to say … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 38 publications
0
5
0
1
Order By: Relevance
“…We focused on (i) empirical, quantitative studies that were (ii) either peer-reviewed articles or completed PhD theses, (iii) published in English (iv) between 2000 and 2020. We chose this publication period as the digitization and indexing of repositories has been reliably applied since the early 2000s (Colavizza et al, 2019). Before the year 2000, digitization is less comprehensive, which might introduce biases in article selection-as some types of articles may be less likely to be digitized and, therefore, not included.…”
Section: Eligibility Criteriamentioning
confidence: 99%
“…We focused on (i) empirical, quantitative studies that were (ii) either peer-reviewed articles or completed PhD theses, (iii) published in English (iv) between 2000 and 2020. We chose this publication period as the digitization and indexing of repositories has been reliably applied since the early 2000s (Colavizza et al, 2019). Before the year 2000, digitization is less comprehensive, which might introduce biases in article selection-as some types of articles may be less likely to be digitized and, therefore, not included.…”
Section: Eligibility Criteriamentioning
confidence: 99%
“…Access to records' contents, in particular texts, in turn, allows using them for indexation. Colavizza [2019] proposes to automatically create archival information systems leveraging historical indexes, often produced when an archive was still current. Since these indexes often focus on entities such as persons, places, or keywords, which occur across archival fonds, they effectively allow users to search an archive in a complementary way to provenance and original order.…”
Section: Automatic Content Extraction and Indexationmentioning
confidence: 99%
“…Subsequently, the most frequently used system is the Stanford CRF classifier 35 [63], particularly on historical newspapers. Working with the press collection of the National Library of Australia, Kim et al [103] English data, and a custom one trained on 600 articles of the Trove collection (the time period of the sample is not specified).…”
Section: Training Modelsmentioning
confidence: 99%
“…As for historical material (cf. Figure 1), primary needs also revolve around retrieving documents and information, and NE processing is of similar importance [35]. There are less query logs over historical collections than for the contemporary web, but several studies demonstrate how prevalent entity names are in humanities users' searches: 80% of search queries on the national library of France's portal Gallica contain a proper name [33], and geographical and person names dominate the searches of various digital libraries, be they of artworks, domain-specific historical documents, historical newspapers, or broadcasts [14,32,92].…”
Section: Introductionmentioning
confidence: 99%