Predicting Key Example Compounds in Competitors' Patent Applications Using Structural Information Alone

Hattori, Kazunari; Wakabayashi, Hisatsugu; Tamaki, Kenta

doi:10.1021/ci7002686

Cited by 33 publications

(48 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Examples of further data mining and cheminformatics methods applied to SureChEMBL compounds from patent documents are available as IPython Notebooks [ 16 ]. The notebooks aim to identify genuinely novel structures and series as claimed in a patent, along with methods to retrospectively flag key compounds [ 17 , 18 ].

Fig.…”

Section: Resultsmentioning

confidence: 99%

Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents

et al. 2015

View full text Add to dashboard Cite

BackgroundFirst public disclosure of new chemical entities often takes place in patents, which makes them an important source of information. However, with an ever increasing number of patent applications, manual processing and curation on such a large scale becomes even more challenging. An alternative approach better suited for this large corpus of documents is the automated extraction of chemical structures. A number of patent chemistry databases generated by using the latter approach are now available but little is known that can help to manage expectations when using them. This study aims to address this by comparing two such freely available sources, SureChEMBL and IBM SIIP (IBM Strategic Intellectual Property Insight Platform), with manually curated commercial databases.ResultsWhen looking at the percentage of chemical structures successfully extracted from a set of patents, using SciFinder as our reference, 59 and 51 % were also found in our comparison in SureChEMBL and IBM SIIP, respectively. When performing this comparison with compounds as starting point, i.e. establishing if for a list of compounds the databases provide the links between chemical structures and patents they appear in, we obtained similar results. SureChEMBL and IBM SIIP found 62 and 59 %, respectively, of the compound-patent pairs obtained from Reaxys.ConclusionsIn our comparison of automatically generated vs. manually curated patent chemistry databases, the former successfully provided approximately 60 % of links between chemical structure and patents. It needs to be stressed that only a very limited number of patents and compound-patent pairs were used for our comparison. Nevertheless, our results will hopefully help to manage expectations of users of patent chemistry databases of this type and provide a useful framework for more studies like ours as well as guide future developments of the workflows used for the automated extraction of chemical structures from patents. The challenges we have encountered whilst performing this study highlight that more needs to be done to make such assessments easier. Above all, more adequate, preferably open access to relevant ‘gold standards’ is required.Electronic supplementary materialThe online version of this article (doi:10.1186/s13321-015-0097-z) contains supplementary material, which is available to authorized users.

show abstract

Fig.…”

Section: Resultsmentioning

confidence: 99%

Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents

et al. 2015

View full text Add to dashboard Cite

show abstract

“…We were eventually able to download approximately 10 K structures (at approximately US$1 per structure) and view a project chemical landscape where links to the literature were associated with structure, and structures were grouped by similarity. This project eventually died from lack of senior management support, although apparently similar ideas did surface elsewhere within the organization, as structures from patents were used to gauge the relative importance of claimed compounds [117].…”

Section: The Contribution Of Purification and Analytical Sciences To Htmentioning

confidence: 99%

The Essential Roles of Chemistry in High-Throughput Screening Triage

2014

View full text Add to dashboard Cite

It is increasingly clear that academic high-throughput screening (HTS) and virtual HTS triage suffers from a lack of scientists trained in the art and science of early drug discovery chemistry. Many recent publications report the discovery of compounds by screening that are most likely artifacts or promiscuous bioactive compounds, and these results are not placed into the context of previous studies. For HTS to be most successful, it is our contention that there must exist an early partnership between biologists and medicinal chemists. Their combined skill sets are necessary to design robust assays and efficient workflows that will weed out assay artifacts, false positives, promiscuous bioactive compounds and intractable screening hits, efforts that ultimately give projects a better chance at identifying truly useful chemical matter. Expertise in medicinal chemistry, cheminformatics and purification sciences (analytical chemistry) can enhance the post-HTS triage process by quickly removing these problematic chemotypes from consideration, while simultaneously prioritizing the more promising chemical matter for follow-up testing. It is only when biologists and chemists collaborate effectively that HTS can manifest its full promise.

show abstract

“…In addition to their direct value in intellectual property licensing (15) and competitive business analysis (16), patents can serve as a useful resource for a variety of academic research. We examine the datasets used in previous research and show that the data available in SCRIPDB is sufficient, quantitatively and qualitatively, to provide value for future investigations.…”

Section: Discussionmentioning

confidence: 99%

SCRIPDB: a portal for easy access to syntheses, chemicals and reactions in patents

Heifets

Jurišica

2011

Nucleic Acids Research

View full text Add to dashboard Cite

The patent literature is a rich catalog of biologically relevant chemicals; many public and commercial molecular databases contain the structures disclosed in patent claims. However, patents are an equally rich source of metadata about bioactive molecules, including mechanism of action, disease class, homologous experimental series, structural alternatives, or the synthetic pathways used to produce molecules of interest. Unfortunately, this metadata is discarded when chemical structures are deposited separately in databases. SCRIPDB is a chemical structure database designed to make this metadata accessible. SCRIPDB provides the full original patent text, reactions and relationships described within any individual patent, in addition to the molecular files common to structural databases. We discuss how such information is valuable in medical text mining, chemical image analysis, reaction extraction and in silico pharmaceutical lead optimization. SCRIPDB may be searched by exact chemical structure, substructure or molecular similarity and the results may be restricted to patents describing synthetic routes. SCRIPDB is available at http://dcv.uhnres.utoronto.ca/SCRIPDB.

show abstract

Predicting Key Example Compounds in Competitors' Patent Applications Using Structural Information Alone

Cited by 33 publications

References 34 publications

Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents

Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents

The Essential Roles of Chemistry in High-Throughput Screening Triage

SCRIPDB: a portal for easy access to syntheses, chemicals and reactions in patents

Contact Info

Product

Resources

About