Background Interactions of microbes and diseases are of great importance for biomedical research. However, large-scale of microbe–disease interactions are hidden in the biomedical literature. The structured databases for microbe–disease interactions are in limited amounts. In this paper, we aim to construct a large-scale database for microbe–disease interactions automatically. We attained this goal via applying text mining methods based on a deep learning model with a moderate curation cost. We also built a user-friendly web interface that allows researchers to navigate and query required information. Results Firstly, we manually constructed a golden-standard corpus and a sliver-standard corpus (SSC) for microbe–disease interactions for curation. Moreover, we proposed a text mining framework for microbe–disease interaction extraction based on a pretrained model BERE. We applied named entity recognition tools to detect microbe and disease mentions from the free biomedical texts. After that, we fine-tuned the pretrained model BERE to recognize relations between targeted entities, which was originally built for drug–target interactions or drug–drug interactions. The introduction of SSC for model fine-tuning greatly improved detection performance for microbe–disease interactions, with an average reduction in error of approximately 10%. The MDIDB website offers data browsing, custom searching for specific diseases or microbes, and batch downloading. Conclusions Evaluation results demonstrate that our method outperform the baseline model (rule-based PKDE4J) with an average $$F_1$$ F 1 -score of 73.81%. For further validation, we randomly sampled nearly 1000 predicted interactions by our model, and manually checked the correctness of each interaction, which gives a 73% accuracy. The MDIDB webiste is freely avaliable throuth http://dbmdi.com/index/
Motivation Understanding chemical–gene interactions (CGIs) is crucial for screening drugs. Wet experiments are usually costly and laborious, which limits relevant studies to a small scale. On the contrary, computational studies enable efficient in-silico exploration. For the CGI prediction problem, a common method is to perform systematic analyses on a heterogeneous network involving various biomedical entities. Recently, graph neural networks become popular in the field of relation prediction. However, the inherent heterogeneous complexity of biological interaction networks and the massive amount of data pose enormous challenges. This paper aims to develop a data-driven model that is capable of learning latent information from the interaction network and making correct predictions. Results We developed BioNet, a deep biological networkmodel with a graph encoder–decoder architecture. The graph encoder utilizes graph convolution to learn latent information embedded in complex interactions among chemicals, genes, diseases and biological pathways. The learning process is featured by two consecutive steps. Then, embedded information learnt by the encoder is then employed to make multi-type interaction predictions between chemicals and genes with a tensor decomposition decoder based on the RESCAL algorithm. BioNet includes 79 325 entities as nodes, and 34 005 501 relations as edges. To train such a massive deep graph model, BioNet introduces a parallel training algorithm utilizing multiple Graphics Processing Unit (GPUs). The evaluation experiments indicated that BioNet exhibits outstanding prediction performance with a best area under Receiver Operating Characteristic (ROC) curve of 0.952, which significantly surpasses state-of-theart methods. For further validation, top predicted CGIs of cancer and COVID-19 by BioNet were verified by external curated data and published literature.
Motivation Automatic recognition of chemical structures from molecular images provides an important avenue for the rediscovery of chemicals. Traditional rule-based approaches that rely on expert knowledge and fail to consider all the stylistic variations of molecular images usually suffer from cumbersome recognition processes and low generalization ability. Deep learning-based methods that integrate different image styles and automatically learn valuable features are flexible, but currently under-researched and have limitations, and are therefore not fully exploited. Results MICER, an encoder-decoder-based, reconstructed architecture for molecular image captioning, combines transfer learning, attention mechanisms, and several strategies to strengthen effectiveness and plasticity in different datasets. The effects of stereochemical information, molecular complexity, data volume, and pre-trained encoders on MICER performance were evaluated. Experimental results show that the intrinsic features of the molecular images and the sub-model match have a significant impact on the performance of this task. These findings inspire us to design the training dataset and the encoder for the final validation model, and the experimental results suggest that the MICER model consistently outperforms the state-of-the-art methods on four datasets. MICER was more reliable and scalable due to its interpretability and transfer capacity and provides a practical framework for developing comprehensive and accurate automated molecular structure identification tools to explore unknown chemical space. Availability https://github.com/Jiacai-Yi/MICER Supplementary information Supplementary data are available at Bioinformatics online.
<abstract> <p>Clinical event detection (CED) is a hot topic and essential task in medical artificial intelligence, which has attracted the attention from academia and industry over the recent years. However, most studies focus on English clinical narratives. Owing to the limitation of annotated Chinese medical corpus, there is a lack of relevant research about Chinese clinical narratives. The existing methods ignore the importance of contextual information in semantic understanding. Therefore, it is urgent to research multilingual clinical event detection. In this paper, we present a novel encoder-decoder structure based on pre-trained language model for Chinese CED task, which integrates contextual representations into Chinese character embeddings to assist model in semantic understanding. Compared with existing methods, our proposed strategy can help model harvest a language inferential skill. Besides, we introduce the punitive weight to adjust the proportion of loss on each category for coping with class imbalance problem. To evaluate the effectiveness of our proposed model, we conduct a range of experiments on test set of our manually annotated corpus. We compare overall performance of our proposed model with baseline models on our manually annotated corpus. Experimental results demonstrate that our proposed model achieves the best precision of 83.73%, recall of 86.56% and F1-score of 85.12%. Moreover, we also evaluate the performance of our proposed model with baseline models on minority category samples. We discover that our proposed model obtains a significant increase on minority category samples.</p> </abstract>
Background Electronic medical records contain a variety of valuable medical information for patients. So, when we are able to recognize and extract risk factors for disease from EMRs of patients with cardiovascular disease (CVD), and are able to use them to predict CVD, we have the ability to automatically process clinical texts, resulting in an improved accuracy of supporting doctors for the clinical diagnosis of CVD. In the case where CVD is becoming more worldwide, predictive CVD based on EMRs has been studied by many researchers to address this important aspect of improving diagnostic efficiency. Methods This paper proposes an Enhanced Character-level Deep Convolutional Neural Networks (EnDCNN) model for cardiovascular disease prediction. Results On the manually annotated Chinese EMRs corpus, our risk factor identification extraction model achieved 0.9073 of F-score, our prediction model achieved 0.9516 of F-score, and the prediction result is better than the most previous methods. Conclusions The character-level model based on text region embedding can well map risk factors and their labels as a unit into a vector, and downsampling plays a crucial role in improving the training efficiency of deep CNN. What’s more, the shortcut connections with pre-activation used in our model architecture implements dimension-matching free in training.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.