2023
DOI: 10.5334/johd.95
|View full text |Cite
|
Sign up to set email alerts
|

MultiHATHI: A Complete Collection of Multilingual Prose Fiction in the HathiTrust Digital Library

Abstract: This dataset provides detailed metadata on ca. 10.2 million works of fiction and nonfiction written after 1799 in 521 different languages available in the HathiTrust Digital Library. The dataset bolsters the May 2022 Hathifile by supplying missing predicted fiction tags with a bespoke BERT-based multilingual classifier. Our classifier completes the catalogue with an additional 400,000 non-English volumes predicted to be works of fiction, capturing 95% of all works presently provided by HathiTrust. We provide e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 8 publications
0
1
0
Order By: Relevance
“…200,000 fictional narratives in English in the Hathi Trust Digital Library that has been refined and updated by to include a comparison corpus of non-fiction prose across 1.5 million sampled pages published since 1800. Hamilton and Piper (2023) extends this work to include multilingual fiction annotation across 521 different languages. Erlin et al (2022) provide metadata on translations of fiction into English from 120 different languages also located in the Hathi Trust.…”
Section: Narrative Infrastructuresmentioning
confidence: 90%
“…200,000 fictional narratives in English in the Hathi Trust Digital Library that has been refined and updated by to include a comparison corpus of non-fiction prose across 1.5 million sampled pages published since 1800. Hamilton and Piper (2023) extends this work to include multilingual fiction annotation across 521 different languages. Erlin et al (2022) provide metadata on translations of fiction into English from 120 different languages also located in the Hathi Trust.…”
Section: Narrative Infrastructuresmentioning
confidence: 90%