Background Enzymatic and chemical reactions are key for understanding biological processes in cells. Curated databases of chemical reactions exist but these databases struggle to keep up with the exponential growth of the biomedical literature. Conventional text mining pipelines provide tools to automatically extract entities and relationships from the scientific literature, and partially replace expert curation, but such machine learning frameworks often require a large amount of labeled training data and thus lack scalability for both larger document corpora and new relationship types. Results We developed an application of Snorkel, a weakly supervised learning framework, for extracting chemical reaction relationships from biomedical literature abstracts. For this work, we defined a chemical reaction relationship as the transformation of chemical A to chemical B. We built and evaluated our system on small annotated sets of chemical reaction relationships from two corpora: curated bacteria-related abstracts from the MetaCyc database (MetaCyc_Corpus) and a more general set of abstracts annotated with MeSH (Medical Subject Headings) term Bacteria (Bacteria_Corpus; a superset of MetaCyc_Corpus). For the MetaCyc_Corpus, we obtained 84% precision and 41% recall (55% F1 score). Extending to the more general Bacteria_Corpus decreased precision to 62% with only a four-point drop in recall to 37% (46% F1 score). Overall, the Bacteria_Corpus contained two orders of magnitude more candidate chemical reaction relationships (nine million candidates vs 68,0000 candidates) and had a larger class imbalance (2.5% positives vs 5% positives) as compared to the MetaCyc_Corpus. In total, we extracted 6871 chemical reaction relationships from nine million candidates in the Bacteria_Corpus. Conclusions With this work, we built a database of chemical reaction relationships from almost 900,000 scientific abstracts without a large training set of labeled annotations. Further, we showed the generalizability of our initial application built on MetaCyc documents enriched with chemical reactions to a general set of articles related to bacteria.
Several statistical methods have been proposed for testing gene(G)-environment(E) interactions under additive risk models using genome-wide association study data. However, these approaches have strong assumptions on underlying genetic models such as dominant or recessive effects that are known to be less robust when the true genetic model is unknown. We aim to develop a robust trend test employing a likelihood ratio test for detecting G-E interaction under an additive risk model, while incorporating the G-E independence assumption to increase power. We used a constrained likelihood to impose two sets of constraints for (i) the linear trend effect of genotype and (ii) the additive joint effects of G and E. To incorporate the G-E independence assumption, a retrospective likelihood was used versus a standard prospective likelihood. Numerical investigation suggests that the proposed tests are more powerful than tests assuming dominant, recessive, or general models under various parameter settings and under both likelihoods. Incorporation of the independence assumption enhances efficiency by 2.5- fold. We applied the proposed methods to examine gene-smoking interaction for lung cancer and gene-APOE*4 interaction for Alzheimer’s disease, which identified two interactions between APOE*4 and loci MS4A and BIN1 at genome-wide significance that were replicated using independent data.
This paper tackles the challenge of colorizing grayscale images. We take a deep convolutional neural network approach, and choose to take the angle of classification, working on a finite set of possible colors. Similarly to a recent paper, we implement a loss and a prediction function that favor realistic, colorful images rather than "true" ones.We show that a rather lightweight architecture inspired by the U-Net, and trained on a reasonable amount of pictures of landscapes, achieves satisfactory results on this specific subset of pictures. We show that data augmentation significantly improves the performance and robustness of the model, and provide visual analysis of the prediction confidence.We show an application of our model, extending the task to video colorization. We suggest a way to smooth color predictions across frames, without the need to train a recurrent network designed for sequential inputs.
Evaluating gene by environment (G$\times$E) interaction under an additive risk model (i.e. additive interaction) has gained wider attention. Recently, statistical tests have been proposed for detecting additive interaction that utilize an assumption on G-E independence to boost power, which do not rely on restrictive genetic models such as dominant or recessive models. However, a major limitation of these methods is a sharp increase in type I error when this assumption is violated. Our goal is to develop a robust test for additive G$\times$E interaction under the trend effect of genotype, applying an empirical Bayes-type shrinkage estimator of the relative excess risk due to interaction. The proposed method uses a set of constraints to impose the trend effect of genotype and builds an estimator that data-adaptively shrinks a RERI estimator obtained under a general model for G-E dependence using a retrospective likelihood framework. Numerical study under varying levels of departures from G-E independence shows that the proposed method is robust against the violation of the independence assumption while providing an adequate balance between bias and efficiency compared to existing methods. We applied the proposed method to the genetic data of Alzheimer’s disease and lung cancer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.