Supplementary data are available at Bioinformatics online.
BackgroundMultilayered hierarchical gene regulatory networks (ML-hGRNs) are very important for understanding genetics regulation of biological pathways. However, there are currently no computational algorithms available for directly building ML-hGRNs that regulate biological pathways.ResultsA bottom-up graphic Gaussian model (GGM) algorithm was developed for constructing ML-hGRN operating above a biological pathway using small- to medium-sized microarray or RNA-seq data sets. The algorithm first placed genes of a pathway at the bottom layer and began to construct a ML-hGRN by evaluating all combined triple genes: two pathway genes and one regulatory gene. The algorithm retained all triple genes where a regulatory gene significantly interfered two paired pathway genes. The regulatory genes with highest interference frequency were kept as the second layer and the number kept is based on an optimization function. Thereafter, the algorithm was used recursively to build a ML-hGRN in layer-by-layer fashion until the defined number of layers was obtained or terminated automatically.ConclusionsWe validated the algorithm and demonstrated its high efficiency in constructing ML-hGRNs governing biological pathways. The algorithm is instrumental for biologists to learn the hierarchical regulators associated with a given biological pathway from even small-sized microarray or RNA-seq data sets.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-0981-1) contains supplementary material, which is available to authorized users.
Despite their important roles, the regulators for most metabolic pathways and biological processes remain elusive. Presently, the methods for identifying metabolic pathway and biological process regulators are intensively sought after. We developed a novel algorithm called triple-gene mutual interaction (TGMI) for identifying these regulators using high-throughput gene expression data. It first calculated the regulatory interactions among triple gene blocks (two pathway genes and one transcription factor (TF)), using conditional mutual information, and then identifies significantly interacted triple genes using a newly identified novel mutual interaction measure (MIM), which was substantiated to reflect strengths of regulatory interactions within each triple gene block. The TGMI calculated the MIM for each triple gene block and then examined its statistical significance using bootstrap. Finally, the frequencies of all TFs present in all significantly interacted triple gene blocks were calculated and ranked. We showed that the TFs with higher frequencies were usually genuine pathway regulators upon evaluating multiple pathways in plants, animals and yeast. Comparison of TGMI with several other algorithms demonstrated its higher accuracy. Therefore, TGMI will be a valuable tool that can help biologists to identify regulators of metabolic pathways and biological processes from the exploded high-throughput gene expression data in public repositories.
BackgroundPresent knowledge indicates a multilayered hierarchical gene regulatory network (ML-hGRN) often operates above a biological pathway. Although the ML-hGRN is very important for understanding how a pathway is regulated, there is almost no computational algorithm for directly constructing ML-hGRNs.ResultsA backward elimination random forest (BWERF) algorithm was developed for constructing the ML-hGRN operating above a biological pathway. For each pathway gene, the BWERF used a random forest model to calculate the importance values of all transcription factors (TFs) to this pathway gene recursively with a portion (e.g. 1/10) of least important TFs being excluded in each round of modeling, during which, the importance values of all TFs to the pathway gene were updated and ranked until only one TF was remained in the list. The above procedure, termed BWERF. After that, the importance values of a TF to all pathway genes were aggregated and fitted to a Gaussian mixture model to determine the TF retention for the regulatory layer immediately above the pathway layer. The acquired TFs at the secondary layer were then set to be the new bottom layer to infer the next upper layer, and this process was repeated until a ML-hGRN with the expected layers was obtained.ConclusionsBWERF improved the accuracy for constructing ML-hGRNs because it used backward elimination to exclude the noise genes, and aggregated the individual importance values for determining the TFs retention. We validated the BWERF by using it for constructing ML-hGRNs operating above mouse pluripotency maintenance pathway and Arabidopsis lignocellulosic pathway. Compared to GENIE3, BWERF showed an improvement in recognizing authentic TFs regulating a pathway. Compared to the bottom-up Gaussian graphical model algorithm we developed for constructing ML-hGRNs, the BWERF can construct ML-hGRNs with significantly reduced edges that enable biologists to choose the implicit edges for experimental validation.
One of the most important attributes of a genome is genome size, which can to a large extent reflect the evolutionary history and diversity of a plant species. However, studies on genome size diversity within a species are still very limited. This study aims to clarify the variation in genome sizes of Chinese jujube and sour jujube, and to characterize if there exists an association between genome sizes and geographical variation. We measured the genome sizes of 301 cultivars of Chinese jujube and 81 genotypes of sour jujube by flow cytometry. Ten fruit traits, including weight, vertical diameter, horizontal diameter, size, total acids, total sugar, monosaccharide, disaccharide, soluble solids, and ascorbic acid were measured in 243 cultivars of Chinese jujube. The estimated genome sizes of Chinese jujube cultivars ranged from 300.77 Mb to 640.94 Mb, with an average of 408.54 Mb, with the highest number of cultivars (20.93%) falling in the range of 334.787 to 368.804 Mb. The genome size is somewhat different with geographical distribution. The results showed weakly significant positive correlation (p < 0.05) between genome size and fruit size, vertical diameter, horizontal diameter, and weight in the Chinese jujube. The estimated sour jujube genome sizes ranged from 346.93 Mb to 489.44 Mb, with the highest number of genotypes (24.69%) falling in the range of 418.185 to 432.436 Mb. The average genome size of sour jujube genotypes is 423.55 Mb, 15 Mb larger than that of Chinese jujube. There exists a high level of variation in genome sizes within both Chinese jujube cultivars and sour jujube genotypes. Genome contraction may have been occurred during the domestication of Chinese jujube. This study is the first large-scale investigation of genome size variation in both Chinese jujube and sour jujube, which has provided useful resources and data for the characterization of genome evolution within a species and during domestication in plants.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.