2020
DOI: 10.3386/w27987
|View full text |Cite
|
Sign up to set email alerts
|

Reliance on Science by Inventors: Hybrid Extraction of In-text Patent-to-Article Citations

Abstract: We curate and characterize a complete set of citations from patents to scientific articles, including nearly 16 million from the full text of USPTO and EPO patents. Combining heuristics and machine learning, we achieve 25% higher performance than machine learning alone. At 99.4% accuracy, coverage of 87.6% is achieved, and coverage above 90% with accuracy above 93%. Performance is evaluated with a set of 5,939 randomly-sampled, cross-verified "known good" citations, which the authors have never seen. We compar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 25 publications
0
7
0
Order By: Relevance
“…We also collect 408,348 patents related to climate change mitigation in buildings, carbon capture and storage (CCS), energy-saving information and communication technology (ICT), energy, clean production, transport and waste (see Veefkind et al, 2012;Angelucci et al, 2018). 2 We supplement the data with data on patent citations and co-classifications (Hötte et al, 2021a), science citations (Marx and Fuegi, 2019;Marx and Fuegi, 2020c;Marx and Fuegi, 2020b) and public R&D support (Fleming et al, 2019b;Fleming et al, 2019a) (see Sec. 4.1).…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…We also collect 408,348 patents related to climate change mitigation in buildings, carbon capture and storage (CCS), energy-saving information and communication technology (ICT), energy, clean production, transport and waste (see Veefkind et al, 2012;Angelucci et al, 2018). 2 We supplement the data with data on patent citations and co-classifications (Hötte et al, 2021a), science citations (Marx and Fuegi, 2019;Marx and Fuegi, 2020c;Marx and Fuegi, 2020b) and public R&D support (Fleming et al, 2019b;Fleming et al, 2019a) (see Sec. 4.1).…”
Section: Methodsmentioning
confidence: 99%
“…To analyze the technological and scientific knowledge base, we use data on (1) citations from patents to patents from Pichler et al, 2020 and (2) citations from patents to science provided by Marx and Fuegi, 2020b. For a description of these data and the use of science citations see Hötte et al, 2021b andFuegi, 2019.…”
Section: Data Sourcesmentioning
confidence: 99%
See 1 more Smart Citation
“…Further, the text of patents themselves often contains information written by inventors about the precursors of their invention. Standardized datasets of scientific references in patent text now exist (Bryan, Ozcan, and Sampat, 2020;Marx and Fuegi, 2020), but more complex uses of natural language data to study technology diffusion are underexploited.…”
Section: Measuring Diffusionmentioning
confidence: 99%
“…Not surprisingly, the "Google translate" paper has 31 coauthors, many of them leading researchers in their respective fields (Wu et al, 2016). This seems to be a broader trend separating university research from industry research in this area: using data from Marx (2019), we examined the average number of coauthors in the five leading machine learning conferences in Hartmann and Henkel (2019) from 2011 to 2018 and found that research by large firms features on average one more co-author (4.3) than non-large firm papers (3.4). 16 These firms make up 10 percent (2,168 out of 20,989) of the papers published with fewer than eleven authors, but comprise 28 percent (22 out of 79) of the papers published with more than eleven authors.…”
Section: Corporate Labs Solve Practical Problemsmentioning
confidence: 99%