Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries 2016
DOI: 10.1145/2910896.2925465
|View full text |Cite
|
Sign up to set email alerts
|

Inventor Name Disambiguation for a Patent Database Using a Random Forest and DBSCAN

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
18
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 13 publications
(18 citation statements)
references
References 6 publications
0
18
0
Order By: Relevance
“…Patent data are an important resource for monitoring and evaluating scientific and technical work across a range of areas (Wang et al , 2015; Zhu and Porter, 2002) such as evaluating the productivity of an inventor or an institution (Czarnitzki et al , 2007; Rahal and Rabelo, 2006), analyzing inventor migration patterns (Doherr, 2017), exploring the economic issues associated with innovation (Miguélez and Gómezmiguélez, 2011) or assessing the influence of collaborative networks on innovation (Fleming et al , 2007). However, most patent databases do not allocate a unique identifier to each inventor (Kim et al , 2016; Li et al , 2014). As a result, distinguishing between inventors with the same name is a highly challenging task.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Patent data are an important resource for monitoring and evaluating scientific and technical work across a range of areas (Wang et al , 2015; Zhu and Porter, 2002) such as evaluating the productivity of an inventor or an institution (Czarnitzki et al , 2007; Rahal and Rabelo, 2006), analyzing inventor migration patterns (Doherr, 2017), exploring the economic issues associated with innovation (Miguélez and Gómezmiguélez, 2011) or assessing the influence of collaborative networks on innovation (Fleming et al , 2007). However, most patent databases do not allocate a unique identifier to each inventor (Kim et al , 2016; Li et al , 2014). As a result, distinguishing between inventors with the same name is a highly challenging task.…”
Section: Introductionmentioning
confidence: 99%
“…For example, by 2013, there were more than 32 trillion pairs of records and 8 million patents in the United States Patent and Trademark Office (USPTO) database, an impossible number to disambiguate manually. According to the US census data, common names, such as John Smith, are used by about 53 million people, which is equal to 16.4 per cent of the US population (Kim et al , 2016). Furthermore, 51.1 per cent of the inventors do not include a middle name (Akinsanmi et al , 2011).…”
Section: Introductionmentioning
confidence: 99%
“…In these models, documents are represented as bag‐of‐words with traditional features, including term frequency, distribution over terms, and time feature. Based on the bag‐of‐words model, many models that typically compute similarities among documents have been developed for topic detection, such as CLARANS and DBSCAN . The next generation of topic detection models extended the analysis from directly clustering documents to clustering keywords.…”
Section: Introductionmentioning
confidence: 99%
“…Then we analyze matches among ambiguous pairs. Instead of using a simple heuristic, we use a binary Random Forest (RF) classifier, which has been used for evaluating matches in the author and inventor name disambiguation [3], [4], [5]. Features are extracted from common attributes from the data sources.…”
Section: Introductionmentioning
confidence: 99%