An Assessment of Google Books’ Metadata

James, Ryan; Weiss, Andrew

doi:10.1080/19386389.2012.652566

Cited by 29 publications

(7 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…These included a disproportionate number of books listing 1899 as their publication date; anachronistic dates for terms such as “internet”; mixups of author, editor, and/or translator; subject misclassification (e.g., using publishing industry classifications designed to allocate books to shelf space in stores, rather than Library of Congress subject headings); and mis-linking (e.g., mismatch between volume information and page images). James and Weiss’s (2012) quantitative assessment supports Nunberg’s anecdotal findings. In response, Google acknowledged that it had constructed book metadata records by parsing more than 100 sources of data (Orwant, 2009).…”

Section: Books Bite Back: Bookness As Bug Not Featuresupporting

confidence: 74%

Producing “one vast index”: Google Book Search as an algorithmic system

Chalmers

Edwards

2017

Big Data & Society

View full text Add to dashboard Cite

In 2004, Google embarked on a massive book digitization project. Forty library partners and billions of scanned pages later, Google Book Search has provided searchable text access to millions of books. While many details of Google's conversion processes remain proprietary secret, here we piece together their general outlines by closely examining Google Book Search products, Google patents, and the entanglement of libraries and computer scientists in the longer history of digitization work. We argue that far from simply ''scanning'' books, Google's efforts may be characterized as algorithmic digitization, strongly shaped by an equation of digital access with full-text searchability. We explore the consequences of Google's algorithmic digitization system for what end users ultimately do and do not see, placing these effects in the context of the multiple technical, material, and legal challenges surrounding Google Book Search. By approaching digitization primarily as a text extraction and indexing challenge-an effort to convert print books into electronically searchable data-GBS enacts one possible future for books, in which they are defined largely by their textual content.

show abstract

Section: Books Bite Back: Bookness As Bug Not Featuresupporting

confidence: 74%

Producing “one vast index”: Google Book Search as an algorithmic system

Chalmers

Edwards

2017

Big Data & Society

View full text Add to dashboard Cite

show abstract

“…GB includes some errors that were presumably generated by the automatic scanning process. James and Weiss () examined metadata (e.g., author, title, publisher, publication year) from 400 randomly selected scanned texts, finding that 36% contained metadata errors. Of these errors, 41% were related to publishers' names, 24% to authors' names, 20% to publication dates, and 15% to titles.…”

Section: Introductionmentioning

confidence: 99%

An automatic method for extracting citations from Google Books

Kousha

Thelwall

2014

Asso for Info Science & Tech

View full text Add to dashboard Cite

Recent studies have shown that counting citations from books can help scholarly impact assessment and that Google Books (GB) is a useful source of such citation counts, despite its lack of a public citation index. Searching GB for citations produces approximate matches, however, and so its raw results need timeconsuming human filtering. In response, this article introduces a method to automatically remove false and irrelevant matches from GB citation searches in addition to introducing refinements to a previous GB manual citation extraction method. The method was evaluated by manual checking of sampled GB results and comparing citations to about 14,500 monographs in the Thomson Reuters Book Citation Index (BKCI) against automatically extracted citations from GB across 24 subject areas. GB citations were 103% to 137% as numerous as BKCI citations in the humanities, except for tourism (72%) and linguistics (91%), 46% to 85% in social sciences, but only 8% to 53% in the sciences. In all cases, however, GB had substantially more citing books than did BKCI, with BKCI's results coming predominantly from journal articles. Moderate correlations between the GB and BKCI citation counts in social sciences and humanities, with most BKCI results coming from journal articles rather than books, suggests that they could measure the different aspects of impact, however.

show abstract

“…See Deo (2015a). 7 We are aware (see James and Weiss 2012) that Google Ngram Viewer has been largely criticized for poor OCR and incorrect metadata, such as the year of publication. This might have a huge impact on the interpretation of the results.…”

Section: Outline Of Italian Fc Indefinite Pronounsmentioning

confidence: 99%

Indefinites and free choice

Degano

Aloni

2021

Nat Lang Linguist Theory

View full text Add to dashboard Cite

Indefinites display a great functional variety and they give rise to different pragmatic effects. We focus on free choice indefinites and in particular on the Italian qualsiasi. Our aim is to reconstruct the grammaticalization path of this item and understand how diachronic data might shed some light on existing semantic theories of free choice. We employ corpus-based tools to build a database containing occurrences of qualsiasi from its origin and early forms to its current usage. We show that qualsiasi emerged from a particular unconditional construction and we outline the different stages which led to its grammaticalization. We analyze the compatibility of our diachronic study with formal accounts of free choice inferences, with a focus on Alternative Semantics analyses for indefinite pronouns and so-called grammatical theories of free choice. Our work shows that an integration between formal semantics and historical linguistics is fruitful and worth pursuing.

show abstract

An Assessment of Google Books’ Metadata

Cited by 29 publications

References 5 publications

Producing “one vast index”: Google Book Search as an algorithmic system

Producing “one vast index”: Google Book Search as an algorithmic system

An automatic method for extracting citations from Google Books

Indefinites and free choice

Contact Info

Product

Resources

About