2012
DOI: 10.1080/19386389.2012.652566
|View full text |Cite
|
Sign up to set email alerts
|

An Assessment of Google Books’ Metadata

Abstract: This article reports on a study of error rates found in the metadata records of texts scanned by the Google Books digitization project. A review of the author, title, publisher, and publication year metadata elements for 400 randomly selected Google Books records was undertaken. The results show 36% of sampled books in the digitization project contained metadata errors. This error rate is higher than one would expect to find in a typical library online catalog.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
5
0
1

Year Published

2012
2012
2021
2021

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 29 publications
(7 citation statements)
references
References 5 publications
1
5
0
1
Order By: Relevance
“…These included a disproportionate number of books listing 1899 as their publication date; anachronistic dates for terms such as “internet”; mixups of author, editor, and/or translator; subject misclassification (e.g., using publishing industry classifications designed to allocate books to shelf space in stores, rather than Library of Congress subject headings); and mis-linking (e.g., mismatch between volume information and page images). James and Weiss’s (2012) quantitative assessment supports Nunberg’s anecdotal findings. In response, Google acknowledged that it had constructed book metadata records by parsing more than 100 sources of data (Orwant, 2009).…”
Section: Books Bite Back: Bookness As Bug Not Featuresupporting
confidence: 74%
“…These included a disproportionate number of books listing 1899 as their publication date; anachronistic dates for terms such as “internet”; mixups of author, editor, and/or translator; subject misclassification (e.g., using publishing industry classifications designed to allocate books to shelf space in stores, rather than Library of Congress subject headings); and mis-linking (e.g., mismatch between volume information and page images). James and Weiss’s (2012) quantitative assessment supports Nunberg’s anecdotal findings. In response, Google acknowledged that it had constructed book metadata records by parsing more than 100 sources of data (Orwant, 2009).…”
Section: Books Bite Back: Bookness As Bug Not Featuresupporting
confidence: 74%
“…GB includes some errors that were presumably generated by the automatic scanning process. James and Weiss () examined metadata (e.g., author, title, publisher, publication year) from 400 randomly selected scanned texts, finding that 36% contained metadata errors. Of these errors, 41% were related to publishers' names, 24% to authors' names, 20% to publication dates, and 15% to titles.…”
Section: Introductionmentioning
confidence: 99%
“…See Deo (2015a). 7 We are aware (see James and Weiss 2012) that Google Ngram Viewer has been largely criticized for poor OCR and incorrect metadata, such as the year of publication. This might have a huge impact on the interpretation of the results.…”
Section: Outline Of Italian Fc Indefinite Pronounsmentioning
confidence: 99%