Application of an automated natural language processing (NLP) workflow to enable federated search of external biomedical content in drug discovery and development

McEntire, Robin; Szalkowski, Debbie; Butler, James A.; Kuo, Michelle S.; Chang, Min S.; Chang, Man Kit; Freeman, Darren; McQuay, Sarah; Patel, Jagruti; McGlashen, Michael L.; Cornell, Wendy D.; Xu, Jinghai J.

doi:10.1016/j.drudis.2016.03.006

Cited by 16 publications

(6 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“… 2 Newer methods to overcome these challenges include the application of natural language processing (NLP) algorithms, which are sophisticated machine-learning algorithms that can be applied to large volumes of unstructured free text and semi-structured clinical data (eg, physician progress notes or radiology reports). 3 NLP methods have been applied to multiple clinical areas involving unstructured text and big data, 4 , 5 , 6 , 7 , 8 but less often to identify chronic disease cohorts for population management. 9 , 10 Echocardiography data contain detailed clinical information but are generally semi-structured with unstructured free text sections and not feasible to extract manually on a large scale.…”

Section: Introductionmentioning

confidence: 99%

Large-scale identification of aortic stenosis and its severity using natural language processing on electronic health records

Solomon

Tabada

Allen

et al. 2021

Cardiovascular Digital Health Journal

View full text Add to dashboard Cite

BACKGROUND Systematic case identification is critical to improving population health, but widely used diagnosis codebased approaches for conditions like valvular heart disease are inaccurate and lack specificity.OBJECTIVE To develop and validate natural language processing (NLP) algorithms to identify aortic stenosis (AS) cases and associated parameters from semi-structured echocardiogram reports and compare their accuracy to administrative diagnosis codes.METHODS Using 1003 physician-adjudicated echocardiogram reports from Kaiser Permanente Northern California, a large, integrated healthcare system (.4.5 million members), NLP algorithms were developed and validated to achieve positive and negative predictive values . 95% for identifying AS and associated echocardiographic parameters. Final NLP algorithms were applied to all adult echocardiography reports performed between 2008 and 2018 and compared to ICD-9/10 diagnosis code-based definitions for AS found from 14 days before to 6 months after the procedure date.RESULTS A total of 927,884 eligible echocardiograms were identified during the study period among 519,967 patients. Application of the final NLP algorithm classified 104,090 (11.2%) echocardiograms with any AS (mean age 75.2 years, 52% women), with only 67,297 (64.6%) having a diagnosis code for AS between 14 days before and up to 6 months after the associated echocardiogram. Among those without associated diagnosis codes, 19% of patients had hemodynamically significant AS (ie, greater than mild disease).CONCLUSION A validated NLP algorithm applied to a systemwide echocardiography database was substantially more accurate than diagnosis codes for identifying AS. Leveraging machine learningbased approaches on unstructured electronic health record data can facilitate more effective individual and population management than using administrative data alone.

show abstract

Section: Introductionmentioning

confidence: 99%

Large-scale identification of aortic stenosis and its severity using natural language processing on electronic health records

Solomon

Tabada

Allen

et al. 2021

Cardiovascular Digital Health Journal

View full text Add to dashboard Cite

show abstract

“…In addition legal jurisdictions with respect to where the data is stored can prove problematic as different thresholds exist in different parts of the world [27]. These limitations have not completely stymied recent drug discovery initiatives which have combined web-scraping with data-mining technologies to ensure that the latest and most relevant information related to small molecule drug discovery from a range of different sources is being considered at the implementation stage of drug discovery projects (see e.g [28]).…”

Section: Body Of the Textmentioning

confidence: 99%

The Benefits of In Silico Modeling to Identify Possible Small-Molecule Drugs and Their Off-Target Interactions

Zloh

Kirton

2018

Future Med. Chem.

View full text Add to dashboard Cite

The research into the use of small molecules as drugs continues to be a key driver in the development of molecular databases, computer-aided drug design software and collaborative platforms. The evolution of computational approaches is driven by the essential criteria that a drug molecule has to fulfill, from the affinity to targets to minimal side effects while having adequate absorption, distribution, metabolism, and excretion (ADME) properties. A combination of ligand- and structure-based drug development approaches is already used to obtain consensus predictions of small molecule activities and their off-target interactions. Further integration of these methods into easy-to-use workflows informed by systems biology could realize the full potential of available data in the drug discovery and reduce the attrition of drug candidates.

show abstract

“…To address the issues concerning consumption of raw “crowd expertise” researchers have leveraged technologies such as natural language processing (NLP) and knowledge discovery. In particular, the application of NLP techniques for deriving sentiment (on a selected scale) has been a popular direction.…”

Section: Introductionmentioning

confidence: 99%

Using Stack Overflow content to assist in code review

Sharma

Sodhi

2019

Softw Pract Exp

View full text Add to dashboard Cite

An essential goal for programmers is to minimize the cost of identifying and correcting defects in source code. Code review is commonly used for identifying programming defects. However, manual code review has some shortcomings:(1) it is time-consuming and (2) outcomes are subjective and depend on the skills of reviewers. An automated approach for assisting in code reviews is thus highly desirable. We present a tool for assisting in code review and results from our experiments evaluating the tool in different scenarios. The tool leveraged content available from professional programmer support forums (eg, StackOverflow.com) to determine potential defectiveness of a given piece of source code.The defectiveness is expressed on the scale of {Likely defective, neutral, unlikely to be defective}. The basic idea employed in the tool is (1) to identify a set P of discussion posts on Stack Overflow such that each p ∈ P contains source code fragment(s), which sufficiently resemble the input code C being reviewed, and (2) to determine the likelihood of C being defective by considering all p ∈ P.A novel aspect of our approach is to use document fingerprinting for comparing two pieces of source code. Our choice of document fingerprinting technique is inspired by source code plagiarism detection tools where it has proven to be very successful. In the experiments that we performed to verify the effectiveness of our approach, source code samples from more than 300 GitHub open-source repositories were taken as input. An F1 score of 0.94 has been achieved in identifying correct/relevant results. KEYWORDS automated software engineering, code review, crowd knowledge, software development, Stack Overflow INTRODUCTIONWhat are we trying to do and why is it important?We present a novel tool that assists in carrying out effective code reviews. Identifying and fixing buggy code consume significant time and resources in a software development project. Code review by peers 1 and experienced programmers is an effective method 2,3 for identifying potentially buggy codes. However, the process of code review is slow, and quality of results depends on the skills and experience of the reviewers involved. Moreover, a code review carried out by an Shipra Sharma and Balwinder Sodhi contributed equally to this work.

show abstract

Application of an automated natural language processing (NLP) workflow to enable federated search of external biomedical content in drug discovery and development

Cited by 16 publications

References 19 publications

Large-scale identification of aortic stenosis and its severity using natural language processing on electronic health records

Large-scale identification of aortic stenosis and its severity using natural language processing on electronic health records

The Benefits of In Silico Modeling to Identify Possible Small-Molecule Drugs and Their Off-Target Interactions

Using Stack Overflow content to assist in code review

Contact Info

Product

Resources

About