Abstract. Prior-art search is a critical step in the examination procedure of a patent application. This study explores automatic query generation from patent documents to facilitate the time-consuming and labor-intensive search for relevant patents. It is essential for this task to identify discriminative terms in different sections of a query patent, which enable us to distinguish relevant patents from non-relevant patents. To this end we investigate the term distribution of words occurring in different sections of the query patent and compare them with the rest of the collection using language modeling estimation techniques. We experiment with term weighting based on the Kullback-Leibler divergence between the query patent and the collection and also with parsimonious language model estimation. Both of these techniques promote words that are common in the query patent and are rare in the collection. We also incorporate the classification assigned to patent documents into our model, to exploit the available human judgements in the form of a hierarchical classification. Experimental results show the effectiveness of generated queries particularly in terms of recall while patent description showed to be the most useful source for extracting terms.
Patent prior art search is a type of search in the patent domain where documents are searched for that describe the work previously carried out related to a patent application. The goal of this search is to check whether the idea in the patent application is novel. Vocabulary mismatch is one of the main problems of patent retrieval which results in low retrievability of similar documents for a given patent application. In this paper we show how the term distribution of the cited documents in an initially retrieved ranked list can be used to address the vocabulary mismatch. We propose a method for query modeling estimation which utilizes the citation links in a pseudo relevance feedback set. We first build a topic dependent citation graph, starting from the initially retrieved set of feedback documents and utilizing citation links of feedback documents to expand the set. We identify the important documents in the topic dependent citation graph using a citation analysis measure. We then use the term distribution of the documents in the citation graph to estimate a query model by identifying the distinguishing terms and their respective weights. We then use these terms to expand our original query. We use CLEF-IP 2011 collection to evaluate the effectiveness of our query modeling approach for prior art search. We also study the influence of different parameters on the performance of the proposed method. The experimental results demonstrate that the proposed approach significantly improves the recall over a state-of-the-art baseline which uses the link-based structure of the citation graph but not the term distribution of the cited documents.
Prior art search or recommending citations for a patent application is a challenging task. Many approaches have been proposed and shown to be useful for prior art search. However, most of these methods do not consider the network structure for integrating and diffusion of different kinds of information present among tied patents in the citation network. In this paper, we propose a method based on a timeaware random walk on a weighted network of patent citations, the weights of which are characterized by contextual similarity relations between two nodes on the network. The goal of the random walker is to find influential documents in the citation network of a query patent, which can serve as candidates for drawing query terms and bigrams for query refinement. The experimental results on CLEF-IP datasets (CLEF-IP 2010 and CLEF-IP 2011) show the effectiveness of encoding contextual similarities (common classification codes, common inventor, and common applicant) between nodes in the citation network. Our proposed approach can achieve significantly better results in terms of recall and Mean Average Precision rates compared to strong baselines of prior art search.
Patent prior art search is a task in patent retrieval with the goal of finding documents which describe prior art work related to a query patent. A query patent is a full patent application composed of hundreds of terms which does not represent a single focused information need. Fortunately, other relevance evidence sources (i.e., classification tags and bibliographical data) provide additional details about the underlying information need. In this article, we propose a unified framework that integrates multiple relevance evidence components for query formulation. We first build a query model from the textual fields of a query patent. To overcome the term mismatch, we expand this initial query model with the term distribution of documents in the citation graph, modeling old and recent domain terminology. We build an IPC lexicon and perform query expansion using this lexicon incorporating proximity information. We performed an empirical evaluation on two patent datasets. Our results show that employing the temporal features of documents has a precision enhancing effect, while query expansion using IPC lexicon improves the recall of the final rank list. ACM Reference Format:Parvaz Mahdabi and Fabio Crestani. 2014. Patent query formulation by synthesizing multiple sources of relevance evidence.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.