2019
DOI: 10.1021/acs.jcim.8b00669
|View full text |Cite
|
Sign up to set email alerts
|

Molecular Structure Extraction from Documents Using Deep Learning

Abstract: Chemical structure extraction from documents remains a hard problem due to both false positive identification of structures during segmentation and errors in the predicted structures. Current approaches rely on handcrafted rules and subroutines that perform reasonably well generally, but still routinely encounter situations where recognition rates are not yet satisfactory and systematic improvement is challenging. Complications impacting performance of current approaches include the diversity in visual styles … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
71
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 84 publications
(76 citation statements)
references
References 23 publications
0
71
0
Order By: Relevance
“…Moreover, as the need for harvesting the large amounts of published data grows, the demand for methods for easily mining structures from papers and patent data is also growing. Optical Character Recognition (OCR) systems, relying on a variety of ML and probabilistic pattern recognition techniques, were created to translate 2D depictions of chemical structures to standard chemical representations [ 146 148 ]. Nonetheless, the development of OCR systems can be hindered by the images’ resolutions, the computational interpretations of chemical abbreviations, and the nature of the image representation, which can be embedded in text, in figures containing multiple structures, or in reaction pathways, and can be represented as either a skeletal formula or a Markush structure.…”
Section: Ai Applications Within Drug Discovery Using Molecular Represmentioning
confidence: 99%
“…Moreover, as the need for harvesting the large amounts of published data grows, the demand for methods for easily mining structures from papers and patent data is also growing. Optical Character Recognition (OCR) systems, relying on a variety of ML and probabilistic pattern recognition techniques, were created to translate 2D depictions of chemical structures to standard chemical representations [ 146 148 ]. Nonetheless, the development of OCR systems can be hindered by the images’ resolutions, the computational interpretations of chemical abbreviations, and the nature of the image representation, which can be embedded in text, in figures containing multiple structures, or in reaction pathways, and can be represented as either a skeletal formula or a Markush structure.…”
Section: Ai Applications Within Drug Discovery Using Molecular Represmentioning
confidence: 99%
“…In 2019, Staker et al [ 41 ] presented a data-driven, deep learning based approach for OCSR called Molecular Structure Extraction from Documents Using Deep Learning (MSE-DUDL). The system uses two types of networks in the backend: a segmentation network and a structure prediction network.…”
Section: Machine-learning-based Systemsmentioning
confidence: 99%
“…To the best of our knowledge, only two published methods were fully based on machine learning applied on the raw image data. MSE-DUDL 14 was published in 2019. It contains a segmentation network to extract molecule images from other components of the input page, coupled to a molecular recognition network.…”
Section: Related Workmentioning
confidence: 99%