Seed germination is a complex trait of key ecological and agronomic significance. Few genetic factors regulating germination have been identified, and the means by which their concerted action controls this developmental process remains largely unknown. Using publicly available gene expression data from Arabidopsis thaliana, we generated a condition-dependent network model of global transcriptional interactions (SeedNet) that shows evidence of evolutionary conservation in flowering plants. The topology of the SeedNet graph reflects the biological process, including two state-dependent sets of interactions associated with dormancy or germination. SeedNet highlights interactions between known regulators of this process and predicts the germinationassociated function of uncharacterized hub nodes connected to them with 50% accuracy. An intermediate transition region between the dormancy and germination subdomains is enriched with genes involved in cellular phase transitions. The phase transition regulators SERRATE and EARLY FLOWERING IN SHORT DAYS from this region affect seed germination, indicating that conserved mechanisms control transitions in cell identity in plants. The SeedNet dormancy region is strongly associated with vegetative abiotic stress response genes. These data suggest that seed dormancy, an adaptive trait that arose evolutionarily late, evolved by coopting existing genetic pathways regulating cellular phase transition and abiotic stress. SeedNet is available as a community resource (http:// vseed.nottingham.ac.uk) to aid dissection of this complex trait and gene function in diverse processes.
KNN is one of the most popular data mining methods for classification, but it often fails to work well with inappropriate choice of distance metric or due to the presence of numerous class-irrelevant features. Linear feature transformation methods have been widely applied to extract classrelevant information to improve kNN classification, which is very limited in many applications. Kernels have also been used to learn powerful non-linear feature transformations, but these methods fail to scale to large datasets. In this paper, we present a scalable non-linear feature mapping method based on a deep neural network pretrained with Restricted Boltzmann Machines for improving kNN classification in a large-margin framework, which we call DNet-kNN. DNet-kNN can be used for both classification and for supervised dimensionality reduction. The experimental results on two benchmark handwritten digit datasets and one newsgroup text dataset show that DNet-kNN has much better performance than large-margin kNN using a linear mapping and kNN based on a deep autoencoder pretrained with Restricted Boltzmann Machines.
Transaction Datalog (abbreviated TV) is a concurrent programming language that provides process modeling, database access, and advanced transactions. This paper illustrates the use of TV for specifying and simulating workflows, with examples based on the needs of a highthroughpnt genome laboratory.In addition to traditional database support, these needs include synchronization of work, cooperation between concurrent workflows, and nonserializable access to shared resources. After illustrating workflows, we use 7-1~ to explore their computational complexity in data-intensive applications.We show, for instance, that workflows can be vastly more complex than traditional database transactions, largely because concurrent processes can interact and communicate via the database (i.e., one process can read what another process writes). We then investigate the s,ources of this complexity, focusing on features for data modeling and process modeling. We show that by carefully controlling these features, the complexity of workflows can be reduced substantially.Finally, we develop a sub-language called fully bounded 'TV that provides a practical blend of modeling features while minimizing complexity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.