2011
DOI: 10.1021/ci100384d
|View full text |Cite
|
Sign up to set email alerts
|

Chemical Name to Structure: OPSIN, an Open Source Solution

Abstract: We have produced an open source, freely available, algorithm (Open Parser for Systematic IUPAC Nomenclature, OPSIN) that interprets the majority of organic chemical nomenclature in a fast and precise manner. This has been achieved using an approach based on a regular grammar. This grammar is used to guide tokenization, a potentially difficult problem in chemical names. From the parsed chemical name, an XML parse tree is constructed that is operated on in a stepwise manner until the structure has been reconstru… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
150
0
1

Year Published

2011
2011
2024
2024

Publication Types

Select...
7
2
1

Relationship

1
9

Authors

Journals

citations
Cited by 181 publications
(152 citation statements)
references
References 23 publications
1
150
0
1
Order By: Relevance
“…SMILES, SDF, and InChI strings are common structural representation formats for chemical entities, which can be directly used for structure search operations or the generation of physico-chemical properties. In contrast, each IUPAC name is converted to the corresponding structure using the OPSIN library [39], before any chemical object is created and subsequently preprocessed. If the chemical (protein, DNA or RNA molecule) input is submitted in FASTA format, every sequence is either identified as a nucleotide or peptide sequence type.…”
Section: The Classification Processmentioning
confidence: 99%
“…SMILES, SDF, and InChI strings are common structural representation formats for chemical entities, which can be directly used for structure search operations or the generation of physico-chemical properties. In contrast, each IUPAC name is converted to the corresponding structure using the OPSIN library [39], before any chemical object is created and subsequently preprocessed. If the chemical (protein, DNA or RNA molecule) input is submitted in FASTA format, every sequence is either identified as a nucleotide or peptide sequence type.…”
Section: The Classification Processmentioning
confidence: 99%
“…OPSIN [23], the Open Parser for Systematic IUPAC nomenclature, converts plain-text chemical nomenclature to machine readable CML or InChi formats.…”
Section: Cheminformaticsmentioning
confidence: 99%
“…Besides this basic cheminformatics functionality, Bioclipse also supports chemical names. There is search functionality to find chemicals in ChemSpider and PubChem, and with the OPSIN [29] plugin handling IUPAC names is trivial. Further support is provided by a plugin for the Chemical Resolver Identifier (cactus.nci.nih.gov/chemical/structure).…”
Section: Cheminformaticsmentioning
confidence: 99%