BackgroundThe reconstruction of gene regulatory network (GRN) from gene expression data can discover regulatory relationships among genes and gain deep insights into the complicated regulation mechanism of life. However, it is still a great challenge in systems biology and bioinformatics. During the past years, numerous computational approaches have been developed for this goal, and Bayesian network (BN) methods draw most of attention among these methods because of its inherent probability characteristics. However, Bayesian network methods are time consuming and cannot handle large-scale networks due to their high computational complexity, while the mutual information-based methods are highly effective but directionless and have a high false-positive rate.ResultsTo solve these problems, we propose a Candidate Auto Selection algorithm (CAS) based on mutual information and breakpoint detection to restrict the search space in order to accelerate the learning process of Bayesian network. First, the proposed CAS algorithm automatically selects the neighbor candidates of each node before searching the best structure of GRN. Then based on CAS algorithm, we propose a globally optimal greedy search method (CAS + G), which focuses on finding the highest rated network structure, and a local learning method (CAS + L), which focuses on faster learning the structure with little loss of quality.ConclusionResults show that the proposed CAS algorithm can effectively reduce the search space of Bayesian networks through identifying the neighbor candidates of each node. In our experiments, the CAS + G method outperforms the state-of-the-art method on simulation data for inferring GRNs, and the CAS + L method is significantly faster than the state-of-the-art method with little loss of accuracy. Hence, the CAS based methods effectively decrease the computational complexity of Bayesian network and are more suitable for GRN inference.
The important role of non coding RNAs (ncRNAs) in the cell has made their identification a critical issue in the biological research. However, traditional approaches such as PT-PCR and Northern Blot are costly. With recent progress in bioinformatics and computational prediction technology, the discovery of ncRNAs has become realistically possible. This paper aims to introduce major computational approaches in the identification of ncRNAs, including homologous search, de novo prediction and mining in deep sequencing data. Furthermore, related software tools have been compared and reviewed along with a discussion on future improvements.
In discovering disease etiology and pathogenesis, the associations between MicroRNAs (miRNAs) and diseases play a critical role. Given known miRNA-disease associations (MDAs), how to uncover potential MDAs is an important problem. To solve this problem, most of the existing methods regard known MDAs as positive samples and unknown ones as negative samples, and then predict possible MDAs by iteratively revising the negative samples. However, simply viewing unknown MDAs as negative samples introduces erroneous information, which may result in poor predication performance. To avoid such defects, we present a novel method using only positive samples to predict MDAs by latent features extraction (LFEMDA). We design a new approach to construct the miRNAs similarity matrix. LFEMDA integrates the disease similarity matrix, the known MDAs and the miRNAs similarity matrix to identify potential MDAs. By introducing miRNAs and diseases knowledge as the auxiliary variables, the method can converge to give the optimal solution in each iteration. We conduct experiments on high-association diseases and new diseases datasets, in which our method shows better performance than that of other methods. We also carry out a case study on breast neoplasms to further demonstrate the capacity of our method in uncovering potential MDAs.
Motivation: Pentatricopeptide repeat (PPR) is a triangular pentapeptide repeat domain that plays a vital role in plant growth. In this study, we seek to identify PPR coding genes and proteins using a mixture of feature extraction methods. We use four single feature extraction methods focusing on the sequence, physical, and chemical properties as well as the amino acid composition, and mix the features. The Max-Relevant-Max-Distance (MRMD) technique is applied to reduce the feature dimension. Classification uses the random forest, J48, and naïve Bayes with 10-fold cross-validation.Results: Combining two of the feature extraction methods with the random forest classifier produces the highest area under the curve of 0.9848. Using MRMD to reduce the dimension improves this metric for J48 and naïve Bayes, but has little effect on the random forest results.Availability and Implementation: The webserver is available at: http://server.malab.cn/MixedPPR/index.jsp.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.