Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications

Tarasova, O.; Biziukova, Nadezhda; Filimonov, Dmitry; Poroikov, Vladimir; Nicklaus, Marc C.

doi:10.1021/acs.jcim.9b00164

Cited by 18 publications

(14 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…However, quantitative predictions obtained using our web-service for large and diverse databases may be incorrectly biased towards high activity because the training sets with quantitative data are biased towards the high active substances. This problem is solved by classification models that could be recommended as the first choice for prediction [28].…”

Section: Discussionmentioning

confidence: 99%

(Q)SAR Models of HIV-1 Protein Inhibition by Drug-Like Compounds

et al. 2019

Self Cite

View full text Add to dashboard Cite

Despite the achievements of antiretroviral therapy, discovery of new anti-HIV medicines remains an essential task because the existing drugs do not provide a complete cure for the infected patients, exhibit severe adverse effects, and lead to the appearance of resistant strains. To predict the interaction of drug-like compounds with multiple targets for HIV treatment, ligand-based drug design approach is widely applied. In this study, we evaluated the possibilities and limitations of (Q)SAR analysis aimed at the discovery of novel antiretroviral agents inhibiting the vital HIV enzymes. Local (Q)SAR models are based on the analysis of structure–activity relationships for molecules from the same chemical class, which significantly restrict their applicability domain. In contrast, global (Q)SAR models exploit data from heterogeneous sets of drug-like compounds, which allows their application to databases containing diverse structures. We compared the information for HIV-1 integrase, protease and reverse transcriptase inhibitors available in the EBI ChEMBL, NIAID HIV/OI/TB Therapeutics, and Clarivate Analytics Integrity databases as the sources for (Q)SAR training sets. Using the PASS and GUSAR software, we developed and validated a variety of (Q)SAR models, which can be further used for virtual screening of new antiretrovirals in the SAVI library. The developed models are implemented in the freely available web resource AntiHIV-Pred.

show abstract

Section: Discussionmentioning

confidence: 99%

(Q)SAR Models of HIV-1 Protein Inhibition by Drug-Like Compounds

et al. 2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…We used the set of 148 publications abstracts collected from NCBI PubMed. We used the workflow developed earlier (Tarasova et al, 2019). In this workflow we were focused on the publications that included the description of HIV inhibitors and included the details of biological experiments used for their testing.…”

Section: Algorithm Realizationmentioning

confidence: 99%

“…Besides, the more pressing the problem for humanity is, the more articles devoted to this problem can be found in the repositories of scientific publications. The extraction of records from scientific publications provides the opportunity to analyze the information derived from primary sources; therefore, such an approach helps to obtain the most contemporary information (Cash, 2004;Tarasova et al, 2015Tarasova et al, , 2019Saik et al, 2016). Currently, text-mining technologies aimed at rapid automated extraction of specific information are under rigorous development.…”

Section: Introductionmentioning

confidence: 99%

Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies

et al. 2020

View full text Add to dashboard Cite

Text analysis can help to identify named entities (NEs) of small molecules, proteins, and genes. Such data are very important for the analysis of molecular mechanisms of disease progression and development of new strategies for the treatment of various diseases and pathological conditions. The texts of publications represent a primary source of information, which is especially important to collect the data of the highest quality due to the immediate obtaining information, in comparison with databases. In our study, we aimed at the development and testing of an approach to the named entity recognition in the abstracts of publications. More specifically, we have developed and tested an algorithm based on the conditional random fields, which provides recognition of NEs of (i) genes and proteins and (ii) chemicals. Careful selection of abstracts strictly related to the subject of interest leads to the possibility of extracting the NEs strongly associated with the subject. To test the applicability of our approach, we have applied it for the extraction of (i) potential HIV inhibitors and (ii) a set of proteins and genes potentially responsible for viremic control in HIV-positive patients. The computational experiments performed provide the estimations of evaluating the accuracy of recognition of chemical NEs and proteins (genes). The precision of the chemical NEs recognition is over 0.91; recall is 0.86, and the F1-score (harmonic mean of precision and recall) is 0.89; the precision of recognition of proteins and genes names is over 0.86; recall is 0.83; while F1-score is above 0.85. Evaluation of the algorithm on two case studies related to HIV treatment confirms our suggestion about the possibility of extracting the NEs strongly relevant to (i) HIV inhibitors and (ii) a group of patients i.e., the group of HIV-positive individuals with an ability to maintain an undetectable HIV-1 viral load overtime in the absence of antiretroviral therapy. Analysis of the results obtained provides insights into the function of proteins that can be responsible for viremic control. Our study demonstrated the applicability of the developed approach for the extraction of useful data on HIV treatment.

show abstract

“…To identify signaling pathways, first we manually mapped the initial entities, which were extracted by text-mining, to UniProt Accession numbers and obtained a list of 46 human proteins. Pathways enriched with 46 genes, were identified from the KEGG database [ 18 ] using the “Enrichr” R package. We selected pathways which included at least 3 genes from the 46 ones and adjusted the p -value to less than 0.05.…”

Section: Methodsmentioning

confidence: 99%

“…In addition, there are a lot of data on the molecular mechanisms of HIV infection, regarding multiple pathways of virus–host interactions and the development of novel therapeutic approaches. Text and data mining approaches can be helpful for fast and accurately extracting information about chemical compounds and their biological activities, as well as proteins associated with molecular mechanisms of disease development [ 17 , 18 ]. In this study, we applied text and data mining approaches to identify possible molecular pathways shared by HIV-1 and SARS-CoV-2.…”

Section: Introductionmentioning

confidence: 99%

Data and Text Mining Help Identify Key Proteins Involved in the Molecular Mechanisms Shared by SARS-CoV-2 and HIV-1

et al. 2020

Self Cite

View full text Add to dashboard Cite

Viruses can be spread from one person to another; therefore, they may cause disorders in many people, sometimes leading to epidemics and even pandemics. New, previously unstudied viruses and some specific mutant or recombinant variants of known viruses constantly appear. An example is a variant of coronaviruses (CoV) causing severe acute respiratory syndrome (SARS), named SARS-CoV-2. Some antiviral drugs, such as remdesivir as well as antiretroviral drugs including darunavir, lopinavir, and ritonavir are suggested to be effective in treating disorders caused by SARS-CoV-2. There are data on the utilization of antiretroviral drugs against SARS-CoV-2. Since there are many studies aimed at the identification of the molecular mechanisms of human immunodeficiency virus type 1 (HIV-1) infection and the development of novel therapeutic approaches against HIV-1, we used HIV-1 for our case study to identify possible molecular pathways shared by SARS-CoV-2 and HIV-1. We applied a text and data mining workflow and identified a list of 46 targets, which can be essential for the development of infections caused by SARS-CoV-2 and HIV-1. We show that SARS-CoV-2 and HIV-1 share some molecular pathways involved in inflammation, immune response, cell cycle regulation.

show abstract

Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications

Cited by 18 publications

References 39 publications

(Q)SAR Models of HIV-1 Protein Inhibition by Drug-Like Compounds

(Q)SAR Models of HIV-1 Protein Inhibition by Drug-Like Compounds

Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies

Data and Text Mining Help Identify Key Proteins Involved in the Molecular Mechanisms Shared by SARS-CoV-2 and HIV-1

Contact Info

Product

Resources

About