Mining Patents Using Molecular Similarity Search

Rhodes, J.; Boyer, Stephen; Kreulen, Jeffrey; Chen, Ying; Ordóñez, Patricia

doi:10.1142/9789812772435_0029

Cited by 30 publications

(25 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[14][15][16][17]), often a rate limiting step in workflows [15]. It was so in our use of a patent data base generated by using Blue Gene to read automatically all US patents [16,17]. The 6.7 million records comprise SMILES code [18] for compounds mentioned 1 along with assignee, and also the patent reference by which other data can be joined.…”

Section: Introductionmentioning

confidence: 99%

Drug discovery using very large numbers of patents. General strategy with extensive use of match and edit operations

Robson

Dettinger³

et al. 2011

J Comput Aided Mol Des

View full text Add to dashboard Cite

A patent data base of 6.7 million compounds generated by a very high performance computer (Blue Gene) requires new techniques for exploitation when extensive use of chemical similarity is involved. Such exploitation includes the taxonomic classification of chemical themes, and data mining to assess mutual information between themes and companies. Importantly, we also launch candidates that evolve by "natural selection" as failure of partial match against the patent data base and their ability to bind to the protein target appropriately, by simulation on Blue Gene. An unusual feature of our method is that algorithms and workflows rely on dynamic interaction between match-and-edit instructions, which in practice are regular expressions. Similarity testing by these uses SMILES strings and, less frequently, graph or connectivity representations. Examining how this performs in high throughput, we note that chemical similarity and novelty are human concepts that largely have meaning by utility in specific contexts. For some purposes, mutual information involving chemical themes might be a better concept.

show abstract

Section: Introductionmentioning

confidence: 99%

Drug discovery using very large numbers of patents. General strategy with extensive use of match and edit operations

Robson

Dettinger³

et al. 2011

J Comput Aided Mol Des

View full text Add to dashboard Cite

show abstract

“…Several chemo-informatics tools to analyze chemical similarities between small-molecules are available (Medina-Franco et al, 2007;Miller, 2002;Rhodes et al, 2007).…”

Section: Introduction Imentioning

confidence: 99%

Identifying Network of Drug Mode of Action by Gene Expression Profiling

Iorio

Tagliaferri

Bernardo

2009

Journal of Computational Biology

View full text Add to dashboard Cite

Drug mode of action (MOA) of novel compounds has been predicted using phenotypic features or, more recently, comparing side effect similarities. Attempts to use gene expression data in mammalian systems have so far met limited success. Here, we built a drug similarity network starting from a public reference dataset containing genome-wide gene expression profiles (GEPs) following treatments with more than a thousand compounds. In this network, drugs sharing a subset of molecular targets are connected by an edge or lie in the same community. Our approach is based on a novel similarity distance between two compounds. The distance is computed by combining GEPs via an original rank-aggregation method, followed by a gene set enrichment analysis (GSEA) to compute similarity between pair of drugs. The network is obtained by considering each compound as a node, and adding an edge between two compounds if their similarity distance is below a given significance threshold. We show that, despite the complexity and the variety of the experimental conditions, our approach is able to identify similarities in drug mode of action from GEPs. Our approach can also be used for the identification of the MOA of new compounds.

show abstract

“…the main problem is the relevant, prevalent, and perennial one of what is meant by the similarity of compounds. The general discipline tackling these and related issues is often called molecule mining [14] for a comprehensive bibliography, and [15].…”

Section: Information From Patentsmentioning

confidence: 99%

“…This turns out to be insightful, nonetheless, in regard to readdressing the concepts of similarity and novelty. The initial aim of our project was to provide complementary tools to support patent based chemoinformatics systems developed by our colleagues [15,18]. The overall study with IBM colleagues involved using very high performance computing to read all US patents at that time, and to analyze a patent data base generated consisting of 6.7 million compounds re-expressed in SMILES codes [19] as character strings that represent the chemical formulae of compounds, alongside assignee and patent reference.…”

Section: Scope and Utilitymentioning

confidence: 99%

The Concept of Novel Compositions of Matter: A Theoretical Analysis

Robson¹

2014

Intel Prop Rights

View full text Add to dashboard Cite

Here is discussed in the manner of a review the nature and uses of information measures in the discipline of patenting. From one perspective, the information content in a patent diminishes rapidly as the broadness of the claims increases. Claims made by Markush representations facilitate the quantification of that. The equations will approach yielding zero information if a massive number of chemical themes were implied. Importantly, a more detailed examination of these equations have implications that allow discussion of various aspects of novelty, reasonable consistency with a specific purpose, and perhaps even how many arguments and counterarguments there should be between examiner and assignee.

show abstract

Mining Patents Using Molecular Similarity Search

Cited by 30 publications

References 9 publications

Drug discovery using very large numbers of patents. General strategy with extensive use of match and edit operations

Drug discovery using very large numbers of patents. General strategy with extensive use of match and edit operations

Identifying Network of Drug Mode of Action by Gene Expression Profiling

The Concept of Novel Compositions of Matter: A Theoretical Analysis

Contact Info

Product

Resources

About