Background: A significant portion (about 8% in the human genome) of mammalian mRNA sequences contains AU (Adenine and Uracil) rich elements or AREs at their 3' untranslated regions (UTR). These mRNA sequences are usually stable. However, an increasing number of observations have been made of unstable species, possibly depending on certain elements such as Alu repeats. ARE motifs are repeats of the tetramer AUUU and a monomer A at the end of the repeats ((AUUU) n A). The importance of AREs in biology is that they make certain mRNA unstable. Protooncogene, such as c-fos, c-myc, and c-jun in humans, are associated with AREs. Although it has been known that the increased number of ARE motifs caused the decrease of the half-life of mRNA containing ARE repeats, the exact mechanism is as of yet unknown. We analyzed the occurrences of AREs and Alu and propose a possible mechanism for how human mRNA could acquire and keep AREs at its 3' UTR originating from Alu repeats.
The human colorectal carcinoma cell line (Caco-2) is a commonly used in-vitro test that predicts the absorption potential of orally administered drugs. In-silico prediction methods, based on the Caco-2 assay data, may increase the effectiveness of the high-throughput screening of new drug candidates. However, previously developed in-silico models that predict the Caco-2 cellular permeability of chemical compounds use handcrafted features that may be dataset-specific and induce over-fitting problems. Deep Neural Network (DNN) generates high-level features based on non-linear transformations for raw features, which provides high discriminant power and, therefore, creates a good generalized model. We present a DNN-based binary Caco-2 permeability classifier. Our model was constructed based on 663 chemical compounds with in-vitro Caco-2 apparent permeability data. Two hundred nine molecular descriptors are used for generating the high-level features during DNN model generation. Dropout regularization is applied to solve the over-fitting problem and the non-linear activation. The Rectified Linear Unit (ReLU) is adopted to reduce the vanishing gradient problem. The results demonstrate that the high-level features generated by the DNN are more robust than handcrafted features for predicting the cellular permeability of structurally diverse chemical compounds in Caco-2 cell lines.
With the advent of automated and high-throughput techniques, the number of patent applications containing biological sequences has been increasing rapidly. However, they have attracted relatively little attention compared to other sequence resources. We have built a database server called Patome, which contains biological sequence data disclosed in patents and published applications, as well as their analysis information. The analysis is divided into two steps. The first is an annotation step in which the disclosed sequences were annotated with RefSeq database. The second is an association step where the sequences were linked to Entrez Gene, OMIM and GO databases, and their results were saved as a gene–patent table. From the analysis, we found that 55% of human genes were associated with patenting. The gene–patent table can be used to identify whether a particular gene or disease is related to patenting. Patome is available at ; the information is updated bimonthly.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.